The purpose of this walk-through is to improve the transparency and replicability of the analysis for the study Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores (in press.). This digital document contains all the code and materials utilized in the study. Furthermore, the walk-through meticulously follows the When-to-Worry-and-How-to-Avoid-the-Misuse-of-Bayesian-Statistics checklist (WAMBS checklist) developed by Depaoli & van de Schoot (2017). This checklist outlines the ten crucial points that need careful scrutiny when employing Bayesian inference procedures.
WAMBS checklist
Questionnaire outlining the ten crucial points that need careful scrutiny when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of the analysis (Depaoli & van de Schoot, 2017).
2 Organization
In this walk-through, Section 3 introduce various background topics that are relevant to the present study. These topics enable readers to progress smoothly through this research. Specifically, Section 3.1 provides a brief explanation of how Bayesian inference procedures work and their importance for this research. Section 3.2 is devoted to explaining the difference between two particular distributions, the normal and the beta-proportion distribution, and their role on modeling bounded data. Section 3.3 explain the (generalized) linear mixed models, elaborating on their role in modeling (non)normal clustered and bounded data. Section 3.4 illustrate the concept of measurement error and the role of latent variables to overcome the problems arising from it. Lastly, Section 3.5 explains the effects of the data distributional departures on the parameter estimates, and its importance for this research.
The specific analysis for this study are elaborated from section Section 4 onwards. Particularly, Section 4 elaborates on the general context, gaps and main purpose of the study. Section 5 introduces the research questions that guide this study. Section 6 explores the data and its implications. Section 7 thoroughly develop the methods to analyze the data. Section 8 provides answers to the research question at hand. Section 9 discuss the findings, limitations and future research derived from this study. Lastly, Section 10 provides the concluding thoughts for the study.
Bayesian inference is an approach to statistical modeling and inference that is primarily based on the Bayes’ theorem. The procedure aims to derive appropriate inference statements about a set of parameters by revising and updating their occurrence probabilities in light of new evidence (Everitt & Skrondal, 2010). The procedure consist on defining the model assumptions in the form of a likelihood for the outcome and a set of prior distributions for the parameters of interest. Upon observing empirical data, these priors undergo updating to posterior distributions following Bayes’ rule (Jeffreys, 1998), from which the statistical inferences are derived 1. As an example, a simple linear regression model with a parameter \beta can be encoded under the Bayesian inference paradigm in the following form:
Bayesian inference
Approach to statistical modeling and inference, that aims to derive appropriate inference statements about one or a set of parameters by revising and updating their probabilities in light of new evidence (Everitt & Skrondal, 2010).
\begin{align*}
P(\beta | Y, X ) &= \frac{ P( Y | \beta, X ) \cdot P( \beta ) }{ P( Y ) }
\end{align*}
\tag{1}
where P( Y| \beta, X ) defines the likelihood of the outcome, which represents the assumed probability distribution for the outcome Y, given the parameter \beta and covariate X, i.e., is the distribution that describes the assumption about the underlying process that give rise to the data (Everitt & Skrondal, 2010).
Likelihood
probability distribution that describes the assumption about the underlying process that give rise to the data (Everitt & Skrondal, 2010).
P( \beta ) defines the prior distribution of the parameter \beta. A prior is a probability distribution summarizing the information about a parameter known or assumed before observing any empirical data (Everitt & Skrondal, 2010).
Prior distribution
Probability distribution summarizing the information about a parameter known or assumed before observing any empirical data (Everitt & Skrondal, 2010).
P( Y ) defines the probability distribution of the data, which represents the evidence of the observed empirical data.
As a result P( \beta | Y, X ), which denotes the posterior distribution of the parameter, describes the probability distribution of \beta after observing empirical data.
Posterior distribution
Probability distribution summarizing the information about a parameter after observing empirical data (Everitt & Skrondal, 2010).
Before implementing the Bayesian inference procedures, two important concepts related to Equation 1 need to be understood. First, the evidence of the empirical data P(Y) serves as a normalizing constant. This is just another way of saying that the numerator in the equation is rescaled by a constant obtained from calculating P(Y). Consequently, without loosing generalization, the equation can be succinctly rewritten in the following form:
\begin{align*}
P(\beta | Y, X ) &\propto P( Y | \beta, X ) \cdot P( \beta ) \\
\end{align*}
\tag{2}
where \propto denotes the proportional symbol. This implies that the posterior distribution of \beta is proportional (up to a constant) to the multiplication of the outcome’s likelihood and the parameter’s prior distribution. This definition make the calculation of the posterior distribution easier, by separating the parameter’s updating process from the integration of new empirical data (this will be clearly seen in the code provided in Section 3.1.3).
Second, a dataset usually have multiple observations of the outcome Y and covariate X, in the form of y_{i} and x_{i}. Therefore, by law of probabilities and assuming independence among the observations, the likelihood of the full dataset can be rewritten as the product of all individual observation likelihoods. Consequently, Equation 2 can also be rewritten as follows:
\begin{align*}
P(\beta | Y, X ) &\propto \prod_{i=1}^{n} P( y_{i} | \beta, x_{i} ) \cdot P( \beta )
\end{align*}
\tag{3}
3.1.2 Estimation methods
Several methods within the Bayesian inference procedures can be utilized to estimate the posterior distribution of the parameter, and most of these fall into the category of Markov Chain Monte Carlo methods (MCMC). MCMC are methods to indirectly simulate random observations from probability distributions using stochastic processes (Everitt & Skrondal, 2010)2.
Markov Chain Monte Carlo (MCMC)
Methods to indirectly simulate random observations from probability distributions using stochastic processes (Everitt & Skrondal, 2010).
However, when the parameters of interest are not large in number, a useful pedagogical method to produce the posterior distribution is the grid approximation method. Through this method, an excellent approximation of the parameter’s posterior distribution can be achieved by considering a finite candidate list of parameter values. This method is used in Section 3.1.3 to illustrate how the Bayesian inference works 3.
Grid approximation
Method to indirectly simulate random observations from low dimensional continuous probability distributions, by considering a finite candidate list of parameter values (McElreath, 2020).
3.1.3 How does it work?
A simple Bayesian linear regression model can be written in the following form:
\begin{align*}
y_{i} &= \beta \cdot x_{i} + e_{i} \\
e_{i} &\sim \text{Normal}( 0, 1 ) \\
\beta &\sim \text{Uniform}( -20, +20 )
\end{align*}
where y_{i} denotes the outcome’s observation i, \beta the expected effect of the observed covariate x_{i} on the outcome, and e_{i} the outcome’s residual in observation i. Furthermore, the model assumes the residual e_{i} is also normally distributed with mean zero and standard deviation equal to one. Lastly, prior to observe any data, it is assumed that \beta is uniformly distributed within the range of [-20,+20].
However, a more convenient generalized manner to represent the same linear regression model is as follows:
\begin{align*}
y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\beta &\sim \text{Uniform}( -20, +20 )
\end{align*}
In this definition, the component of the Bayesian inference procedure detailed in Section 3.1.1 are more easily spotted. First, about the likelihood, the outcome is assumed to be normally distributed with mean \mu_{i} and standard deviation equal to one. Second, it is assumed \beta has a prior that is a normal distribution with mean zero and standard deviation equal to one. Additionally, the equations reveal that the mean of the outcome \mu_{i} is modeled by a linear predictor composed by the covariate x_{i} and its effect on the outcome \beta.
For illustration purposes, a simulated regression with n=100 observations was generated assuming \beta=0.2. Figure 1 shows the scatter plot of the generated data (see code below). The grid approximation method is used to generate random observations from the posterior distribution of \beta. Two noteworthy results emerge from the approach. Firstly, once the posterior distribution is generated, various summaries can be used to make inferences about the parameter of interest (refer to the code output below). Secondly, when considering a dataset with n=100 observations, the influence of the prior on the posterior distribution of \beta is negligible. Specifically, prior to observe any data, assuming that \beta could take any value within the range of [-20,+20] with equal probability (left panel of Figure 2) did not have a substantial impact on the distribution of \beta after empirical data was observed (right panel of Figure 2).
Prior to observing empirical data, assuming the parameter could take any value within within the range of [-20,+20] with equal probability is not the only prior assumption that can be made. Different levels of uncertainty associated with a parameter can be encoded by different priors. This concept illustrated with Figure 3 through Figure 5, where three different types of priors are used to encode three levels of uncertainty about the parameter \beta.
user defined function: linear predictor for each candidate
4
calculation of the linear predictor for each candidate
5
user defined function: product of individual observation likelihoods
6
outcome data likelihood
7
prior 1: uniform prior distribution (min=-20, max=+20)
8
prior 2: normal prior distribution (mean=0, sd=0.5)
9
prior 3: normal prior distribution (mean=0.2, sd=0.05)
10
posterior distribution for each prior
First, the distribution depicted in Figure 3 assumes \beta \sim \text{Uniform}(-20, +20) (similar to what is observed in Section 3.1.3). The distribution does not restrain the effect of \beta to be more probable in any range within [-20, +20]. This type of distribution is commonly referred to as a non-informative prior. A non-informative prior reflects reflects the distributional commitment of a parameter to a wide range of values within a specific parameter space (Everitt & Skrondal, 2010).
Non-informative priors
Prior that reflects the distributional commitment of a parameter to a wide range of values within a specific parameter space (Everitt & Skrondal, 2010).
Figure 3: Bayesian inference: posterior distributions with non-informative prior distribution.
Second, the distribution described in Figure 4 assumes \beta \sim \text{Normal}(0, 0.5). Consequently, the effect of \beta is more probable within the range [-1,+1], with less probability associated with parameter values outside this range. This is a an example of a weakly-informative prior distribution. Weakly informative priors reflect the distributional commitment of a parameter to a weakly constraint range of values within a realistic parameter space (McElreath, 2020).
Weakly informative priors
Prior that reflects the distributional commitment of a parameter to a weakly constraint range of values within a realistic parameter space (McElreath, 2020).
Figure 4: Bayesian inference: posterior distributions with weakly-informative prior distribution.
Third, the distribution described in Figure 5 assumes \beta \sim \text{Normal}(0.2, 0.05). As a result, the effect of \beta is more probable within the range [0.1,0.3], with less probability associated with parameter values outside this range. This is an example of an informative prior distribution. Informative priors are distributions that expresses specific and definite information about a parameter (McElreath, 2020).
Informative priors
Prior distributions that that expresses specific and definite information about a parameter (McElreath, 2020).
Figure 5: Bayesian inference: posterior distributions with informative prior distributions.
Lastly, regarding the influence of different priors on the posterior distributions, Figure 3 and Figure 4 reveals that non-informative and weakly-informative priors have a negligible influence on the posterior distribution. Both priors result in similar posteriors. Furthermore, the figure shows the data sample size n=100 is still not enough to provide an unbiased and precise estimation of the true effect. In contrast, Figure 5 shows that, informative priors can have a meaningful influence in the posterior distribution. In this particular case, the prior helps to estimate an unbiased and more precise effect. This results shows that when the data sample size is not sufficiently large, the prior assumptions can play a significant role on obtaining appropriate parameter estimates.
3.1.5 What are Hyperpriors?
In cases requiring greater modeling flexibility, a more refined representation of the parameters’ priors can be defined in terms of hyperparameters and hyperpriors. Hyperparameters refer to parameters indexing a family of possible prior distributions for the original parameter, while hyperpriors are prior distributions for such hyperparameters (Everitt & Skrondal, 2010).
Hyperparameters
Parameters \theta_{2} that indexes a family of possible prior distributions for another parameter \theta_{1}(Everitt & Skrondal, 2010).
A simple example of the use of hyperpriors would be to define the regression model shown in Section 3.1.3 in the following form:
\begin{align*}
y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\beta &\sim \text{Normal}( 0, \text{exp}(v) ) \\
v &\sim \text{Normal}(0, 3)
\end{align*}
where v define the hyperparameter for the parameter \beta, and its associated distribution define its hyperprior.
However, setting prior distributions through hyperparameters brings its own challenges. One notable challenge pertains to the geometry of the parameter’s sample space. This implies that prior probabilistic representations, defined in terms of hyperparameters, sometimes exhibit simpler sample geometries compared simple priors 4. The re-parametrization of priors into such simpler sample geometries leads to the notion of non-centered priors. In this approach, a parameter’s prior distribution is expressed in terms of a hyperparameter, which is defined by a transformation of the original parameter of interest (Gorinova et al., 2019). By incorporating non-centered priors, researchers can ensure the reliability of certain posterior distributions within Bayesian inference procedures. To illustrate, a straightforward example of a non-centered reparametrization of a prior can be demonstrated as follows:
Non-centered priors
Expression of a parameter’s distribution in terms of an hyperparameter defined by a transformation of the original parameter of interest (Gorinova et al., 2019).
\begin{align*}
y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\beta &= z \cdot \text{exp}(v) \\
v &\sim \text{Normal}(0, 3) \\
z &\sim \text{Normal}( 0, 1 )
\end{align*}
where z is a hyperparameter sampled independently from v, and the parameter of interest \beta is obtained as a transformation of the two hyperparameters. Figure 6 illustrates the differences in sampling geometries between a centered and a non-centered parametrization. It is evident that the sampling geometry depicted in the left panel of the figure is narrower than the one depicted in the right panel, and as a result, Bayesian inference procedures have an harder time sampling from the former than the latter distributions.
Figure 6: Centered and non-centered parameter spaces
3.1.6 Importance
The selection of the Bayesian approach was based on three key properties. Firstly, empirical evidence from prior research demonstrates that Bayesian methods outperform frequentist methods, particularly in handling complex and over-parameterized models (Baker, 1998; Kim & Cohen, 1999). This superiority is evident when dealing with complex models, like the proposed GLLAMM, that are challenging to program or are not viable under frequentist methods (Depaoli, 2014).
Secondly, the approach allows for the incorporation of prior information, ensuring that certain parameters are confined within specified boundaries. This helps mitigate non-convergence or improper parameter estimation issues commonly observed in complex models under frequentist methods (Martin & McDonald, 1975; Seaman & Stamey, 2011). In this study, for example, this property was leveraged to incorporate information about the variances of random effects and constrain them to be positive.
Lastly, the Bayesian approach demonstrates proficiency in handling relatively small sample sizes (Baldwin & Fellingham, 2013; Depaoli, 2014; Lambert, Sutton, Burton, Abrams, & Jones, 2006). In this case, despite the study dealing with 2,263 entropy scores, these were derived from a modest sample size of 32 speakers, from whom the inferences are drawn. Consequently, reliance on the asymptotic properties of frequentist methods may not be warranted in this context, underscoring the pertinence of this property to the current study.
Benefits of Bayesian inference procedures
More suitable to deal with:
Complex or highly-parameterized model
Parameter’s constraints.
Small sample sizes
3.2 A tale of two distributions
3.2.1 The normal distribution
A normal distribution is a type of continuous probability distribution in which a random variable can take on values along the real line \left( y_{i} \in [-\infty, \infty] \right). The distribution is characterized by two independent parameters: the mean \mu and the standard deviation \sigma(Everitt & Skrondal, 2010). Thus, a random variable can take on values that are gathered around a mean \mu, with some values dispersed based on some amount of deviation \sigma, without any restriction. Importantly, by definition of the normal distribution, the location (mean) of the distribution does not influence its spread (deviation).
Figure 7 illustrates how the distribution of an outcome changes with different values of \mu and \sigma. The left panel demonstrate that the distribution of the outcome can shift in terms of its location based on the value of \mu. The right panel shows how the distribution of the outcome can become narrower or wider based on the values of \sigma. It is noteworthy that alterations in the mean \mu of the distribution have no impact on its standard deviation \sigma.
plotting normal distribution with different ‘mu’ and ‘sigma=1’
4
plotting normal distribution with ‘mu=0’ and different sigma’s
Figure 7: Normal distribution with different mean and standard deviations
3.2.2 The beta-proportion distribution
A beta-proportion distribution is a type of continuous probability distribution in which a random variable can assume values within the continuous interval between zero and one \left( y_{i} \in [0, 1] \right). The distribution is characterized by two parameters: the mean \mu and the sample sizeM(Everitt & Skrondal, 2010). This implies that a random variable can take on values restricted within the unit interval, centered around a mean \mu, with some values being more dispersed based on the sample sizeM. Additionally, two characteristic define the distribution. Firstly, like the random variable, the mean of the distribution can only take values within the unit interval (\mu \in [0,1]). Secondly, the mean and sample size parameters are no longer independent of each other.
Figure 8 illustrates how an outcome with a beta-proportion distribution changes with different values of \mu and M. The figure reveals two prevalent patterns in the distribution: (1) the behavior of the dispersion, as measured by the sample size, depends on the mean of the distribution, and (2) the larger the sample size, the less dispersed the distribution is within the unit interval.
In contrast, the significance of the beta-proportion distribution lies in providing a suitable alternative for modeling non-normally bounded distributed outcomes, such as the entropy scores utilized in this study. Boundedness refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur (Lebl, 2022). Neglecting the bounded nature of an outcome can lead, at best, to underfitting, and, at worse, to misspecification. Underfitting occurs when statistical models fail to capture the underlying data patterns, potentially causing the generation of predictions outside the data range, hindering the model’s inability to generalize its results when confronted with new data. Conversely, misspecification, marked by a poor representation of relevant aspects of the true data in the model’s functional form or covariates inclusion, can lead to inconsistent and inefficient parameters estimates (Everitt & Skrondal, 2010).
Boundedness
Refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur (Lebl, 2022)
Underfitting
Occurs when statistical models fail to capture the underlying data patterns, potentially causing the generation of predictions outside the data range, hindering the model’s inability to generalize its results when confronted with new data (Everitt & Skrondal, 2010).
Misspecification
Occurs when the model’s functional form or inclusion of covariates poorly represent relevant aspects of the true data. This can lead to inconsistent and inefficient parameters estimates (Everitt & Skrondal, 2010).
3.3 Linear Mixed Models
3.3.1 The ordinary LMM
An ordinary linear mixed model (LMM) is a procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates (Holmes, Bolin, & Kelley, 2019). A commonly know Bayesian probabilistic representation of an ordinary LMM can be expressed as follows:
Ordinary linear mixed model (LMM)
Procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates (Holmes et al., 2019).
where y_{ib} denotes the outcome’s i’th observation clustered in block b, and x_{i} denotes the covariate for observation i. Moreover, \beta denote the fixed slope of the regression. Furthermore, a_{b} denotes the random effects, and \varepsilon_{ib} defines the random outcome residuals. Furthermore, the residuals \varepsilon_{ib} are assumed to be normally distributed with mean zero and standard deviation equal to one. Additionally, prior to observing any data, \beta is assumed to be normally distributed with mean zero and standard deviation equal to 0.5. Similarly, a_{b} is assumed to be normally distributed with mean zero and standard deviation equal to one.
3.3.2 The generalized LMM
A generalized linear mixed model (GLMM) are a set of models used to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates (Lee & Nelder, 1996). Interestingly, the ordinary Bayesian LMM detailed in the previous section can be represented as a special case of GLMM, as follows:
Generalized linear mixed model (GLMM)
Procedure employed to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates (Lee & Nelder, 1996).
Notice this representation explicitly highlights the three components of a GLMM: the likelihood component, the linear predictor, and the link function (McElreath, 2020). The likelihood component specifies the assumption about the distribution of an outcome, in this case a normal distribution with mean \mu_{ib} and standard deviation equal to one. The linear predictor specifies the manner in which the covariate will predict the mean of the outcome. In this case the linear predictor is a linear combination of the parameter \beta, the covariate x_{i}, and the random effects a_{b}. The link function specifies the relationship between the mean of the outcome \mu_{ib} and the linear predictor. In this case no transformation is applied to the linear predictor to match its range with the range of the outcome, as both can take on values within the real line (refer to Section 3.2.1). Lastly, resulting from the use of Bayesian procedures, a fourth component can be added to any GLMM: the prior distributions. The priors describe what is known about the parameters \beta and a_{b} before observing any empirical data.
GLMM components
Likelihood component
Linear predictor
Link function
On the other hand, a Beta-proportion LMM is also a GLMM, and it can be represented probabilistically as follows:
Notice the representation also highlights the three components of a GLMM; however, their assumptions are now slightly different. The likelihood component assumes a beta-proportion distribution for the outcome with mean \mu_{ib} and sample size equal to 10. The linear predictor is still a linear combination of the parameter \beta, the covariate x_{i}, and the random intercepts a_{b}. However, the link function now assumes the mean of the outcome is (non)linearly related to the linear predictor by a inverse-logit function: \text{logit}^{-1}(x) = exp(x) / (1+exp(x)). The inverse-logit function allows the linear predictor to match the range observed in the mean of the beta-proportion distribution \mu_{ib} \in [0,1] (refer to Section 3.2.2). Lastly, the additional fourth component resulting from using Bayesian procedures, the prior assumptions for \beta and a_{b} are also declared.
3.3.3 Importance
Understanding LMM is essential due to the ubiquitous assumption of normally distributed outcomes within the speech intelligibility research field (see Boonen et al., 2021; Flipsen, 2006; Lagerberg et al., 2014). Furthermore, their significance also lies in their ability to model clustered outcomes. Clustering occurs when multiple observations arise from the same individual, location, or time (McElreath, 2020). Accounting for data clustering is essential, as disregarding it may result in biased and inefficient parameter estimates. Consequently, such biases and inefficiencies can diminish statistical power or increase the likelihood of committing a type I error. Statistical power defines the model’s ability to reject the null hypothesis when it is false (Everitt & Skrondal, 2010). Type I error occurs when a null hypothesis is erroneously rejected (Everitt & Skrondal, 2010).
Clustering
Occurs when multiple observations arise from the same individual, location, or time (McElreath, 2020).
The error that results when a null hypothesis is erroneously rejected (Everitt & Skrondal, 2010).
Moreover, the significance of GLMM lies in offering the same benefits as the LMMs, in terms of parameter unbiasedness and efficiency. However, the framework also allows for the modeling of (non)linear relationships of (non)normally distributed outcomes. This is particularly important for modeling bounded data, such as the entropy scores utilized in this study. Refer to Section 3.2.3 to understand the importance of considering the bounded nature of the data in the modeling process.
3.4 Measurement error in an outcome
3.4.1 What is the problem?
Measurement error refers to the disparity between the observed values of a variable, recorded under similar conditions, and some fixed true value which is not directly observable (Everitt & Skrondal, 2010). The problem of measurement error in an outcome is easier to understand with a motivating example. Using a similar model as the one depicted in Section 3.1.3, the probabilistic representation of measurement error in the outcome can be depicted as follows:
Latent variables
It refers to the disparity between the observed values of a variable, recorded under similar conditions, and some fixed true value which is not directly observable (Everitt & Skrondal, 2010).
\begin{align*}
\tilde{y}_{i} &\sim \text{Normal}( y_{i}, s ) \\
y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\beta &\sim \text{Uniform}( -20, 20 )
\end{align*}
This representation effectively means that a manifest outcome \tilde{y}_{i} is assumed to be normally distributed with a mean equal to the latent outcome y_{i} and a measurement error s. The latent outcome y_{i} is also assumed to be normally distributed but with a mean \mu_{i} and a standard deviation of one. The mean of the latent outcome is considered to be explained by a linear combination of the covariate x_{i} and its expected effect \beta. Lastly, prior to observing any data, \beta is assumed to follow a uniform distribution within the range of [-20, +20], representing a non-informative prior.
For illustrative purposes, a simulated outcome with n=100 observations was generated, assuming \beta=0.2, and a measurement error of s=2. Figure 9 shows the scatter plot of the generated data (see code below). The left panel of the figure demonstrates that the manifest outcome has a larger spread than the latent outcome depicted in the right panel. As a result, although \beta is expected to be estimated in an unbiased manner, the statistical hypothesis tests for the parameter will likely be affected due to this larger variability.
The estimation output confirms the previous hypothesis. The posterior distribution of \beta, estimated using the manifest outcome, has a larger standard deviation than the one estimated using the appropriate latent outcome (see Figure 10 and code output below). Furthermore, the code output shows the parameter’s posterior distribution can no longer reject the null hypothesis at confidence levels of 90\% and 95\%, indicating a reduced statistical power.
Figure 10: Bayesian inference: grid approximation on measurement error outcomes
3.4.2 How to solve it?
Latent variables can be used to address the problem arising from the larger observed variability in one or more manifest outcomes. A latent variable is a variable that cannot be directly measured but is assumed to be primarily responsible for the variability in one or more manifest variables (Everitt & Skrondal, 2010). Latent variables can be interpreted as hypothetical constructs, traits, or true variables that account for the variability that induce dependence in one or more manifest variables (Rabe-Hesketh, Skrondal, & Pickles, 2004). This concept is akin to a linear mixed model, where the random effects serve to account for the variability that induces dependence within clustered outcomes (Rabe-Hesketh et al., 2004) (refer to Section 3.3). The most widely known examples of latent variable models include Confirmatory Factor Analysis and Structural Equation Models (CFA and SEM, respectively).
Latent variables
Variables that cannot be measured directly but are assumed to be the principal responsible for the common variability in one or more manifest variables (Everitt & Skrondal, 2010).
Commonly, latent variable models consist of two parts: a measurement part and a structural part. In the measurement part, the principles of the Thurstonian model (Luce, 1959; Thurstone, 1927) are employed to aggregate one or more manifest variables and estimate a latent variable. In the structural part, regression-like relationships among latent and other manifest variables are specified, allowing researchers to test hypotheses about their (causal) relationships (Hoyle, 2014). While the measurement part is sometimes of interest in its own right, the substantive model of interest is often defined by the structural part (Rabe-Hesketh et al., 2004).
3.4.3 Importance
It becomes evident that when an outcome is measured with error, the estimation procedures based on standard assumptions yield inefficient parameter estimates. This implies that the parameters are not estimated with sufficient precision. Consequently, such inefficiency can reduce statistical power and increase the likelihood of committing a type II error, which occurs when a null hypothesis is erroneously accepted (Everitt & Skrondal, 2010).
Type II error
The error that results when a null hypothesis is erroneously accepted (Everitt & Skrondal, 2010).
Therefore, the issue of measurement error in an outcome is highly relevant to this study. This research assumes that a speaker’s (latent) potential intelligibility contributes, in part, to the observed variability in the speaker’s (manifest) entropy scores. Given the interest in testing hypotheses about the potential intelligibility of speakers, and considering that the entropy scores are subject to measurement error, it becomes necessary to use latent variables to generate precise parameter estimates to test the hypothesis of interest.
3.5 Distributional departures
3.5.1 Heteroscedasticity
In the context of regression analysis, heteroscedasticity occurs when the variance of an outcome depends on the values of another variable (Everitt & Skrondal, 2010). The opposite case is called homoscedasticity. An example of heteroscedasticity can be probabilistically represented as follows:
Heteroscedasticity
Occurs when the variance (standard deviation) of an outcome depends on the values of another variable. The opposite case is called homoscedasticity(Everitt & Skrondal, 2010).
\begin{align*}
y_{i} &\sim \text{Normal}( \mu_{i}, \sigma_{i} ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\sigma_{i} &= exp( \gamma \cdot x_{i} ) \\
\beta &\sim \text{Uniform}( -20, 20 ) \\
\gamma &\sim \text{Uniform}( -20, 20 )
\end{align*}
This representation implies that an outcome y_{i} is assumed normally distributed with mean \mu_{i} and a standard deviation \sigma_{i}. Furthermore, the mean and standard deviation of the outcome is explained by the covariate x_{i}, through the parameters \beta and \gamma. Lastly, prior to observing any data, \beta and \gamma are assumed to be uniformly distributed in the range of [-20,+20].
Figure 11 illustrate the presence of heteroscedasticity using the previous representation, assuming a sample size of n=100, and parameters \beta=0.2 and \gamma=1. Notice the variability of the outcome increases as the covariate also increases. Consequently, it is easy to intuit that this difference in the outcome’s variability could have and impact on the statistical hypothesis tests of \beta, and even in the estimate itself. To prove the intuition, an incorrect model is used to estimate \beta.
\begin{align*}
y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\beta &\sim \text{Uniform}( -20, 20 ) \\
\end{align*}
As a result, the hypotheses are proven accurate. When an outcome is erroneously assumed homoscedastic, the parameter estimates not only become inefficient but also are not estimated closer to the true value, as seen in the output code below and in Figure 12.
In regression analysis, outliers are defined as observations that appear to deviate markedly from other sample data points in which they occur (Everitt & Skrondal, 2010). Although no unique probabilistic representation of outliers can be represented, a simple example can be illustrated with Figure 13. The figure depicts the presence of three influential observations in the outcome (colored blue). It is easier to intuit that with the presence of influential observations the parameter estimates, and the hypothesis test resulting from them, can be affected.
Outlier
Observation that appear to deviate markedly from other sample data points in which it occurs (Everitt & Skrondal, 2010).
The intuition is proven correct when \beta is estimated using the same incorrect model used in Section 3.5.1. When an outcome is erroneously assumed without outliers, the parameter value is estimated farther from the truth, as observed in the code output below and in Figure 14.
As recommended by McElreath (2020), robust models can be used to deal with these types of distributional departures. Robust models are a general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate departures of the data from the model’s assumptions (Everitt & Skrondal, 2010). The procedure consist on modifying the statistical models to include traits that effectively make them robust to small departures from the distributional assumption, like heteroscedastic errors or to the presence of outliers.
Robust models
A general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model for (Everitt & Skrondal, 2010).
3.5.4 Importance
It is known that dealing with heteroscedasticity and the identification of outlier through preliminary univariate procedures is prone to the erroneous transformation or exclusion of valuable information. This can ultimately bias the parameter estimates, and even make them inefficient (McElreath, 2020). Bias refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated (Everitt & Skrondal, 2010).
Bias
It refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated (Everitt & Skrondal, 2010).
Dealing with the possibility of heteroscedasticity or outlying observations is relevant to the present study, because there is an interest in testing hypotheses about the potential intelligibility of speakers. Therefore, it is a necessity to considering the possibility of using robust regression models to assess these distributional departures and generate unbiased parameter estimates.
4 Introduction
Intelligibility is at the core of successful, felicitous communication. Thus, being able to speak intelligibly is a major achievement in language acquisition and development. Furthermore, intelligibility is considered to be the most practical index to assess competence in oral communication (Kent, Miolo, & Bloedel, 19943). Consequently, it serves as a key indicator for evaluating the effectiveness of various interventions like speech therapy or cochlear implantation (Chin, Bergeson, & Phan, 2012). Speech intelligibility refers to the extent to which a listener can accurately recover the elements in an acoustic signal produced by a speaker, such as phonemes or words (Freeman, Pisoni, Kronenberger, & Castellanos, 2017; van Heuven, 2008; Whitehill & Chau, 2004). Studies that investigate intelligibility have utilized entropy scores to examine differences in children’s intelligibility, particularly between those with normal hearing and those with cochlear implants (Boonen et al., 2021).
However, despite their potential as a fine-grained metric of intelligibility, as proposed by Boonen et al. (2021), they exhibit a statistical complexity that cautions researchers against treating them as straightforward indices of intelligibility. This complexity emerges from the processes of data collection and transcription aggregation, endowing the scores with four distinctive features: boundedness, measurement error, clustering, and the possible presence of outliers and heteroscedasticity. Firstly, entropy scores are confined to an interval between zero and one, a phenomenon known as boundedness (refer to Section 3.2). Secondly, entropy scores are a manifestation of a speaker’s intelligibility, with this intelligibility being the primary factor influencing the observed scores. This issue is commonly referred to as measurement error (refer to Section 3.4). Thirdly, due to the repeated assessment of speakers through multiple speech samples, the scores exhibit clustering (refer to Section 3.3). Lastly, driven by the specific set of speakers and speech samples under scrutiny, these scores often display a potential for the presence of outliers and heteroscedasticity (refer to Section 3.5).
Failure to collectively address these data features can result in numerous statistical challenges that might hamper the researcher’s ability to investigate intelligibility. Notably, neglecting boundedness can, at best, lead to underfitting and, at worst, to misspecification. Underfitting can cause the generation of inconsistent predictions, thus hindering the model’s ability to generalize when confronted with new data. Conversely, misspecification can lead to inconsistent and less efficient parameter estimates (refer to Section 3.2.3). Additionally, overlooking issues such as measurement error, clustering, outliers or heteroscedasticity can lead to biased and less precise parameter estimates, ultimately diminishing the statistical power of models and increasing the likelihood of committing type I or type II errors when addressing research inquiries (refer to Section 3.4.3, Section 3.3.3, and Section 3.5.4).
In the realm of computational statistics and data analysis, several models have been developed to address some of these data features individually and, at times, collectively. All of these models have found moderate adoption in various fields, including speech communication, psychology, education, health care, chemistry, and policy analysis. Specifically, in the domain of speech communication, Boonen et al. (2021) addressed data clustering within the context of intelligibility research. Conversely, de Brito Trindade et al. (2021) and Kangmennaang et al. (2023) concentrated on tackling non-normal bounded data with measurement error in covariates, within the context of chemical reactions and health care access, respectively. Remarkably, despite these individual efforts, there is, to the best of the authors’ knowledge, no study comprehensively addressing all of these data features in a principled way while also transparently and systematically documenting the Bayesian estimation of the resulting statistical models.
5 Research questions
Considering the imperative need to comprehensively address all data features when investigating unobservable and complex traits, this investigation aims to demonstrate the efficacy of the Generalized Linear Latent and Mixed Model (GLLAMM) in handling entropy scores features when exploring research theories concerning speech intelligibility. To achieve this objective, the study will reexamine data originating from transcriptions of spontaneous speech samples, initially collected by Boonen et al. (2021). Subsequently, this data will be aggregated into entropy scores and subjected to modeling through the Bayesian Beta-proportion GLLAMM.
To address the primary objective, the study poses three key research questions. First, given the importance of accurate predictions in developing useful practical models and testing research theories (Shmueli & Koppius, 2011), Research Question 1 (RQ1) assesses whether the Beta-proportion GLLAMM yields more accurate predictions than the more prevalent Normal Linear Mixed Model (LMM) (Holmes et al., 2019). Second, acknowledging that intelligibility is an unobservable, intricate concept and a key indicator of oral communication competence (Kent et al., 19943), Research Question 2 (RQ2) investigates how the proposed model can estimate speakers’ latent intelligibility from manifest entropy scores. Thirdly, recognizing that research involves developing and comparing theories, Research Question 3 (RQ3) illustrates how these research theories can be examined within the model’s framework. Specifically, RQ3 assesses the influence of speaker-related factors on the newly estimated latent intelligibility.
The findings of this study will equip researchers investigating speech intelligibility using entropy scores, or those grappling with similar data challenges, with a statistical tool that improves upon existing research models. The tool will provide an assessment of the predictability of empirical phenomena, along with the capability to develop a quantitative measure for the latent variable of interest. The latter, in turn, could facilitate the appropriate comparison of existing theories related to the latent variable, and even the development of new ones.
6 Data
The data comprised the transcriptions of spontaneous speech samples originally collected by Boonen et al. (2021). The data is not publicly available due to privacy restrictions. Nonetheless, the data can be provided by the corresponding author upon reasonable request.
6.1 Speakers
Boonen et al. (2021) selected 32 speakers, comprising 16 normal hearing children (NH) and 16 hearing-impaired children with cochlear implants (HI/CI). At the time of the collection of the speech samples, the NH group were between 68 and 104 months old (M = 86.3, SD = 9.0), while HI/CI group were between 78 and 98 months old (M = 86.3, SD = 6.7).
6.2 Speech samples
Boonen and colleagues selected speech samples from a large corpus of children’s spontaneously spoken speech recordings. These recordings were obtained as the children narrated a story prompted by the picture book “Frog, Where Are You?” (Mayer, 1969) to a caregiver ‘unfamiliar with the story’. Before recording, the children were allowed to skim over the booklet and examine pictures. Prior to the selection process, the recordings were orthographically transcribed using the CHAT format in the CLAN editor (MacWhinney, 2020). These transcriptions were exclusively used in the curation of appropriate speech samples. To ensure the quality of the selection, Boonen and colleagues excluded sentences containing syntactically ill-formed or incomplete statements, background noise, crosstalk, long hesitations, revisions, or non-words. Finally, ten speech samples were randomly chosen for each of the 32 selected speakers. Each of these samples comprised a single sentence with a length of three to eleven words (M = 7.1, SD = 1.1). The process resulted in a total of 320 selected sentences collectively comprising 2,263 words.
Speech samples
Sentences with a length of three to eleven words (M = 7.1, SD = 1.1).
6.3 Listeners
Boonen and colleagues recruited 105 students from the University of Antwerp. All participants were native speakers of Belgian Dutch and reported no history of hearing difficulties or prior exposure to the speech of hearing-impaired speakers.
6.4 Transcription task
The 320 speech samples and 105 listeners were randomly assigned to five blocks, with each block consisting of approximately 21 listeners who transcribed 64 sentences presented in random order. This resulted in a total of 47,514 transcribed words from the original 2,263 words present in the speech samples. These orthographic transcriptions were automatically aligned with a python script at the sentence level, in a column-like grid structure like the one presented in Table 1. This alignment process was repeated for each sentence within each speaker and block, and the output was manually checked and adjusted (if needed) in order to appropriately align the words. For more details on the random assignment and alignment procedures refer to Boonen et al. (2021).
6.5 Entropy calculation
Next, this study aggregated the aligned transcriptions by listener yielding 2,2634 entropy scores, one score per word. The entropy scores were calculated following Shannon’s formula (1948):
where H_{wsib} denotes the entropy scores confined to an interval between zero and one, with w defining the word index, s the sentence index, i the speaker index, and b the block index. Moreover, K describes the number of different word types within transcriptions, and J defines the total number of word transcriptions. Notice that by design, the total number of word transcriptions J corresponds with the number of listeners per block, i.e., 21 listeners. Lastly, p_{k} = \sum_{j=1}^{J} 1(T_{jk}) / J denotes the proportion of word types within transcriptions, with 1(T_{jk}) describing an indicator function that takes the value of one when the word type k is present in the transcription j.
These entropy scores served as the outcome variable, capturing agreement or disagreement among listeners’ word transcriptions. Lower scores indicated a higher degree of agreement between transcriptions and therefore higher intelligibility, while higher scores indicated lower intelligibility, due to a lower degree of agreement in the transcriptions (Boonen et al., 2021; Faes, De Maeyer, & Gillis, 2021). Furthermore, no score is excluded from the modeling process using univariate procedures, rather, the identification of highly influential observations is performed within the context of the proposed models, as recommended by McElreath (2020) (refer to Section 3.5).
Entropy interpretation
Lower scores indicated a higher degree of agreement between transcriptions and therefore higher intelligibility, while higher scores indicated lower intelligibility, due to a lower degree of agreement in the transcriptions (Boonen et al., 2021; Faes et al., 2021)
Table 1: Hypothetical alignment of word transcriptions and entropy scores. Note: Extracted from Boonen et al. (2021), and slightly modified for illustrative purposes. Entropy scores are calculated the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners \left( s=1, i=1, b=1, J=5 \right). Transcriptions are in Dutch with English translation. [B] represent a blank space, and [X] an unidentifiable speech.
Transcription
Words
Number
1
2
3
4
5
1
de
jongen
ziet
een
kikker
the
boy
sees
a
frog
2
de
jongen
ziet
de
[X]
the
boy
sees
the
[X]
3
de
jongen
zag
[B]
kokkin
the
boy
saw
[B]
cook
4
de
jongen
zag
geen
kikkers
the
boy
saw
no
frogs
5
de
hond
zoekt
een
[X]
the
dog
searches
a
[X]
Entropy
0
0.3109
0.6555
0.8277
1
In this context, it is relevant to exemplify the entropy calculation procedure. For that purpose, the words in position two, four and five observed in Table 1 were used. These words were assumed present in the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners (w=\{2,4,5\}, s=1, i=1, b=1, J=5). For the word 2, the first four listeners identified the word type jongen(T_{j1}), while the last identified the word type hond(T_{j2}). Therefore, two word types were identified (K=2), with proportions equal to \{ p_{1}, p_{2} \} = \{ 4/5, 1/5 \} = \{ 0.8, 0.2 \}, and entropy score equal to:
H_{2111} = \frac{ 0.8 \cdot log_{2}(0.8) + 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.3109
For the word 4, two listeners identified the word type een(T_{j1}), one listener the word type de(T_{j2}), and another the word geen(T_{j3}). A blank space [B] is a symbol that defines the absence of a word in a space where a word is expected, as compared with other transcriptions, during the alignment procedure. Notice that for calculation purposes, because the blank space is not expected in such position, this is considered as a different word type. Consequently four word types were registered (K=4), with proportions equal to \{ p_{1}, p_{2}, p_{3}, p_{4} \} = \{ 2/5, 1/5, 1/5, 1/5 \} = \{ 0.4, 0.2, 0.2, 0.2 \} and entropy score equal to:
H_{4111} = \frac{ 0.4 \cdot log_{2}(0.4) + 3 \cdot 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.8277
Lastly, for word 5, each listener transcribed a different word. it is important to highlight that when a listener does not identify a complete word, or part of it, (s)he is instructed to write [X] in that position. However, for the calculation of the entropy score, if more than one listener marks an unidentifiable word with [X], each one of them is considered a different word type. This is done to avoid the artificial reduction of the entropy score, as [X] values already indicate the word’s lack of intelligibility. . Consequently, five word types were observed, T_{j1}=kikker, T_{j2}=[X], T_{j3}=kokkin, T_{j4}=kikkers, T_{j5}=[X] (K=5), with proportions equal to \{ p_{1}, p_{2}, p_{3}, p_{4}, p_{5} \} = \{ 1/5, 1/5, 1/5, 1/5, 1/5 \} = \{ 0.2, 0.2, 0.2, 0.2, 0.2 \}, and entropy score equal to:
As expected, the data exploration reveals from the start two significant features of the entropy scores: clustering and boundedness (refer to Section 3.2.3 and Section 3.3.3). In the case of the entropy scores, clustering arises due to the presence of various word-level scores generated for numerous sentences, originated from different speakers and evaluated in different blocks (see code output below, depicting the first ten observations of the data). On the other hand, entropy scores exhibit boundedness as they can only take on values within the continuous interval between zero and one, particularly H_{wsib} \in [0,1] (see Figure 15 showing three randomly selected speakers).
Figure 15: Entropy scores distribution: all sentences of selected speakers
Additionally, the data shows the 320 speakers’ speech samples consists of sentences with a minimum of 3 and a maximum of 11 words per sentence (M=7.1, SD=1.1), where most of the speech samples have between 5 and 9 words per sentence (see Figure 16).
speech_samples =data.frame( speech_samples )hist(speech_samples$Freq, breaks=20, xlim=c(2, 12),main='', xlab='words per sentence')
1
histogram of words per sentences
Figure 16: Histogram of words per sentences in the speech samples
Code
psych::describe( speech_samples$Freq )
1
statistical descriptors for the speech samples
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 320 7.07 1.06 7 7.04 1.48 3 11 8 0.19 1.45 0.06
Moreover, the data comprised 16 normal hearing children (NH, hearing status category 1) and 16 hearing impaired children, with cochlear implant (HI/CI, hearing status category 2). At the time of the collection of the speech samples, the NH group were between 68 and 104 months old (M=86.3, SD=9.0), while HI/CI group were between 78 and 98 months old (M=86.3, SD=6.7).
Code
d_mom =unique( data_H[,c('cid','HS','A')])with( d_mom, table( A, HS ) )
1
unique hearing status and chronological age per speaker
2
number of speakers per chronological age and hearing status
Lastly, before fitting the models using Bayesian inference, the data was formatted as a list including all necessary information for the fitting process:
List of 14
$ N : int 2263
$ B : int 5
$ I : int 32
$ U : int 10
$ W : num 11
$ cHS : int 2
$ bid : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
$ cid : int [1:2263] 1 1 1 1 1 1 1 1 1 1 ...
$ uid : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
$ wid : num [1:2263] 1 2 3 4 5 6 7 8 1 2 ...
$ HS : int [1:2263] 2 2 2 2 2 2 2 2 2 2 ...
$ A : int [1:2263] 85 85 85 85 85 85 85 85 85 85 ...
$ Am : int [1:2263] 17 17 17 17 17 17 17 17 17 17 ...
$ Hwsib: num [1:2263] 0.057 0.279 0.279 0.461 0.113 ...
7 Methods
This section articulates the probabilistic formalism of both the Normal LMM and the proposed Beta-proportion GLLAMM. Subsequently, it details the set of fitted models and the estimation procedure, along with the criteria employed to assess the quality of the Bayesian inference results. Lastly, the section outlines the methodology employed for model comparison.
7.1 Statistical models
7.1.1 Normal LMM
The general mathematical formalism of the Normal LMM posits that the likelihood of the (manifest) entropy scores H_{wsib} follows a normal distribution, i.e.
where \mu_{sib} represents the average entropy at the word-level and \sigma_{i} denotes the standard deviation of the average entropy at the word-level, varying for each speaker. Given the clustered nature of the data, \mu_{sib} is defined by the linear combination of individual characteristics and several random effects:
where HS_{i} and A_{i} denote the hearing status and chronological age of speaker i, respectively. Additionally, \alpha denote the general intercept, \alpha_{HS[i]} represents the average entropy for each hearing status group, and \beta_{A,HS[i]} denotes the evolution of the average entropy per unit of chronological age A_{i} for each hearing status group. Furthermore, u_{si} denotes the sentence-speaker random effects measuring the unexplained entropy variability within sentences for each speaker, e_{i} denotes the speaker random effects describing the unexplained entropy variability between speakers, and a_{b} denotes the block random effects assessing the unexplained variability between experimental blocks.
Several notably features of the Normal LLM can be discerned from the equations. Firstly, Equation 5 indicates that the variability of the average entropy at the word-level can differ for each speaker, enhancing the model’s robustness to mild or moderate data departures from the normal distribution assumption, such as heteroscedasticity or outliers (refer to Section 3.5). Secondly, Equation 6 reveals that the model assumes no transformation is applied to the relationship between the average entropy and the linear predictor. This is commonly known as a direct link function. Moreover, Equation 6 indicates that chronological age is centered around the minimum chronological age in the sample \bar{A}. The centering procedure is employed to prevent the interpretation of parameters outside the range of chronological ages available in the data (Everitt & Skrondal, 2010). Lastly, the equation implies the model considers separate intercept and separate slope of age for each hearing status group, i.e., NH and HI/CI speakers
Centering
Procedure use to facilitate the interpretation of regression parameters (Everitt & Skrondal, 2010).
7.1.2 Beta-proportion GLLAMM
The general mathematical formalism of the proposed Beta-proportion GLLAMM comprises four components: a response model, with its likelihood, linear predictor, and link function, and a structural model. The response model posits the likelihood of entropy scores follow a beta-proportion distribution,
where\mu_{ib} denotes the average entropy at the word-level and M_{i} signifies the dispersion of the average entropy at the word-level, varying for each speaker. Additionally, \mu_{ib} is defined as,
where \text{logit}^{-1}(x) = exp(x) / (1+exp(x)) is the inverse-logit link function, a_{b} denotes the block random effects, and SI_{i} describes the speaker’s latent potential intelligibility. Conversely, the structural equation model relates the speakers’ latent potential intelligibility to the individual characteristics:
where \alpha defines the general intercept, \alpha_{HS[i]} denotes the potential intelligibility for different hearing status groups, and \beta_{A,HS[i]} indicates the evolution of potential intelligibility per unit of chronological age for each hearing status group. Furthermore, e_{i} represents speakers block effects, describing unexplained potential intelligibility variability between speakers, and u_{i} = \sum_{s=1}^{S} u_{si}/S denotes sentence random effects, assessing the average unexplained potential intelligibility variability among sentences within each speaker, with S denoting the total number of sentences per speaker.
Several features are evident in this probabilistic representations. Firstly, akin to the Normal LMM, Equation 7 reveals that the dispersion of average entropy at the word level can differ for each speaker. This enhances the model’s robustness to mild or moderate data departures from the beta-proportion distribution assumption (refer to Section 3.5). Secondly, in contrast with the Normal LMM, Equation 8 shows the potential intelligibility of a speakers has a negative non-linear relationship with the entropy scores, explicitly highlighting the inverse relationship between intelligibility and entropy. This feature also maps the unbounded linear predictor to the bounded limits of the entropy scores. Thirdly, in contrast with the Normal LMM, Equation 9 demonstrates that the structural parameters are interpretable in terms of the latent potential intelligibility scores, where the scale of the latent trait is set by the general intercept \alpha, as required in latent variable models (Depaoli, 2021). Furthermore, the equation implies the model also considers separate intercept and separate slope of age for each hearing status group, i.e., NH and HI/CI speakers. Additionally, Equation 9 indicates that chronological age is centered around the minimum chronological age in the sample \bar{A}. Lastly, the same equation assumes the intelligibility scores have two sources of unexplained variability: e_{i} and u_{i}. The former represents inherent differences in potential intelligibility among different speakers, while the latter assumes that different sentences measure potential intelligibility differently due to variations in word difficulties and their interplay within the sentence.
7.2 Prior distributions
Bayesian procedures require the incorporation of priors (refer to Section 3.1). This study establishes priors and hyperpriors for the parameters of both the Normal LMM and the Beta-proportion GLLAMM using prior predictive simulations. This procedure entails the semi-independent simulation of parameters, which are subsequently transformed into simulated data values according to the models’ specifications. This procedure aims to establish meaningful priors and comprehend its implications within the context of the model before incorporating any information derived from empirical data (McElreath, 2020).
Prior predictive simulations
Procedure that entails the semi-independent simulation of parameters, which are subsequently transformed into simulated data values according to the models’ specifications. The procedure aims to establish meaningful priors and comprehend its implications within the context of the model before incorporating any information derived from empirical data (McElreath, 2020).
7.2.1 Normal LMM
For the parameters of the Normal LMM, non-informative priors and hyperpriors are established to align with analogous model assumptions in frequentist methods (refer to Section 3.1.4). The specified priors are as follows:
7.2.1.1 Standard deviation \sigma_{i}
As described in Section 7.3, the models initially consider one \sigma prior for all the speakers. This choice implies that the presumed uncertainty for the unexplained variability of the average entropy at the word-level is the same for all speakers, prior to the observation of empirical data.
The left panel of Figure 17 shows the weakly informative prior expects \sigma to be possible only in a positive range, as it is required for variability parameters (Depaoli, 2021). Furthermore, the right panel of Figure 17 shows that when transformed to the entropy scale, the model expect predictions to fall beyond the feasible range of the outcome.
Figure 17: Normal LMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale
Furthermore, as described in Section 7.1.1 and Section 7.3, there is a possibility that the model considers one \sigma_{i} prior for each of the speakers in the data. This choice implies that the presumed uncertainty about unexplained variability of the average entropy at the word-level is similar for each speaker, prior to observing empirical data. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5).
\begin{align}
r_{S} &\sim \text{Exponential}\left( 2 \right) \\
\sigma_{i} &\sim \text{Exponential}\left( r_{S} \right)
\end{align}
\tag{11}
The left panel of Figure 18 shows the weakly informative prior expects \sigma_{i} to be possible only in a positive range, as it is required for variability parameters (Depaoli, 2021). The panel also shows the parameters are more likely to happen in the interval of [0, 2.5]. Moreover, the right panel of Figure 18 shows that when the prior is transformed to the entropy scale, the model expect scores to fall beyond the feasible range of the outcome.
Figure 18: Normal LMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale
7.2.1.2 Intercepts \alpha
This parameter is used in preliminary models where no mathematical formulations regarding how speaker-related factors influence intelligibility are investigated. The prior distribution for \alpha under the Normal LMM is described in Equation 12.
The left panel of Figure 19 show the prior is an narrowly concentrated around zero. Moreover, the right panel of Figure 19, demonstrate that when the parameter is transformed to the entropy scale, the model anticipates entropy scores at low levels of the feasible range of the outcome. This implies that particular bias in entropy scores towards lower values are expected by prior.
Figure 19: Normal LMM, general intercept prior distribution: parameter and entropy scale
7.2.1.3 Hearing status effects \alpha_{HS[i]}
The prior distribution for the Normal LMM is described in Equation 13. Notably, the same prior is applied to both two hearing status categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.
The left panel of Figure 20 reveal a weakly informative prior that restricts the range of probability of \alpha_{HS[i]} between [0.3, 0.7]. This implies that no particular bias towards entropy values above or below 0.5 for different hearing status groups is present in the priors. However, the right panel of Figure 20 demonstrate that when the prior is transformed to the entropy scale, the model anticipates a concentration of data around low levels of entropy, but also beyond the feasible range of the outcome.
Figure 20: Normal LMM, hearing status effects prior distribution: parameter and entropy scale
7.2.1.4 Chronological age per hearing status \beta_{A,HS[i]}
The prior distribution for the Normal LMM is described in Equation 23. Notably, the same prior is applied to both two hearing status categories. This choice implies that the evolution of entropy attributed to chronological age between the categories is presumed to have similar uncertainties prior to the observation of empirical data.
The left panel of Figure 21 shows the prior restricts \beta_{A,HS[i]} to be mostly within the range of [-0.4, 0.4]. This implies that there is no particular bias towards a positive or negative evolution of entropy scores due to chronological age per hearing status group. However, the right panel of Figure 21 show that when this prior is transformed to the entropy scale, the model anticipate a concentration of entropy values at lower levels, but it also expects entropy scores significantly beyond the feasible range of the outcome.
Figure 21: Normal LMM, chonological age per hearing status effects prior distribution: parameter and entropy scale
7.2.1.5 speaker differences e_{i}
The prior distribution of e_{i} for the Normal LMM is described in Equation 15. The same prior is assigned to each speaker in the sample. This choice implies that differences in entropy scores between speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5).
The left panel of Figure 22 shows the prior anticipates differences in entropy scores between speakers as large 3 units of entropy. However, the right panel of Figure 22 demonstrate that when transformed to the entropy scale the model anticipates a concentration scores around low levels, but also it expects the differences to go way beyond the feasible range of the outcome.
Figure 22: Normal LMM, speaker differences prior distribution: parameter and entropy scale
7.2.1.6 Within sentence-speaker differences u_{si}
The prior distribution of u_{si} for the Normal LMM is described in Equation 16. The same prior is assigned to each sentence within each speakers in the sample. This choice implies that the average entropy score differences among sentences within speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters (refer to Section 3.1.5).
The left panel of Figure 23 shows the prior restricts the average differences in entropy among sentences within speakers can be as large as 3 units of measurement. Furthermore, the right panel of Figure 23 demonstrate that when transformed to the entropy scale the model anticipates a concentration of scores around mid-levels of entropy. More importantly, the model expects the differences to go beyond the feasible range of the outcome.
Figure 23: Normal LMM, within sentence-speaker average differences prior distribution: parameter and entropy scale
7.2.1.7 Random block effect a_{b}
The prior distribution for the Normal LMM is described in Equation 17. The same prior is assigned to each block. This choice implies that the average entropy score differences among blocks are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5).
The left panel of Figure 24 shows a prior with no particular bias towards differences between blocks above or below zero units of entropy. Nevertheless, the right panel of Figure 24 demonstrate that when the prior is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy, but also contemplates differences beyond the feasible range of the outcome.
Figure 24: Normal LMM, block differences prior distribution: parameter and entropy scale
7.2.1.8 Linear predictor g(\cdot)
After the careful assessment of the prior implications for each parameter, the expected prior distribution for the linear predcitor can be constructed for the Normal LMM. The prior predictive simulation can be described as in Equation 18:
\begin{align}
m &\sim \text{Normal} \left( 0, 0.05 \right) \\
s &\sim \text{Exponential} \left( 2 \right) \\
e_{i} &\sim \text{Normal} \left( m, s \right) \\
u_{si} &\sim \text{Normal} \left( m, s \right) \\
a_{b} &\sim \text{Normal} \left( m, s \right) \\
\alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.2 \right) \\
\beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\
g(\cdot) &= \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{si} + a_{b} \\
\end{align}
\tag{18}
The left panel of Figure 34 shows the prior expects speakers’ potential intelligibility scores to be more probable between [-2.5, 2.5], implying there is particular bias towards negative entropy scores is present jointly in these priors. Furthermore, the right panel of Figure 34, demonstrate that when transformed to the entropy scale, the model anticipates prediction of entropy scores within its feasible range, but somewhat more probable in the extremes of entropy.
Figure 25: Normal LMM, linear predictor distribution: parameter and entropy scale
7.2.2 Beta-proportion GLLAMM
For the parameters of the Beta-proportion GLLAMM, weakly informative priors and hyperpriors are established (refer to Section 3.1.4). The specified priors are as follows:
7.2.2.1 Sample size M_{i}
Similar to the Normal LMM, Section 7.3 describes a Beta-proportion GLLAMM that initially considers one M for all speakers in the data. This choice implies that the presumed uncertainty for the unexplained variability of the average entropy at the word-level is the same for all speakers, prior to the observation of empirical data.
\begin{align}
M &\sim \text{Exponential}\left( 0.4 \right)
\end{align}
\tag{19}
The left and right panel of Figure 26, demonstrate the prior of M expects the parameters to be more probable in a positive range between [0, 7], while predicting scores within the boundaries of the data. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.
Furthermore, as described in Section 7.1.2 and Section 7.3, there is a possibility that the model considers one M_{i} prior for each speakers in the data. This choice implies the presumed uncertainty for the unexplained dispersion of the average entropy at the word-level is similar for each speaker, prior to the observation of empirical data. In this case the parameters are defined in terms of hyperpriors (refer to Section 3.1.5).
The left and right panel of Figure 27, demonstrate the prior of M_{i} expects the parameters to be more probable in a positive range between [0, 20], while at the same time predicting data within the boundaries of the entropy scores. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.
Considering that the structural parameters are now interpretable in terms of the (latent) potential intelligibility scores, the general intercept \alpha is used to set the scale of the latent trait, as it is required in latent variable models (Depaoli, 2021) (refer to Section 3.1.4). The prior distribution for \alpha under the Beta-proportion GLLAMM is described in Equation 21.
The left panel of Figure 28 show the prior is narrowly concentrated around zero. Moreover, the right panel of Figure 28, demonstrate that when the parameter is transformed to the entropy scale, the model anticipates entropy scores at mid-levels of the feasible range of the outcome. This implies that no particular bias in entropy scores are expected by prior.
Figure 28: Beta-proportion GLLAMM, general intercept prior distribution: parameter and entropy scale
7.2.2.3 Hearing status effects \alpha_{HS[i]}
The prior distribution for the Beta-proportion GLLAMM is described in Equation 22. Notably, the same prior is applied to both two hearing status categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.
The right panel of Figure 29, demonstrate that when the \alpha_{HS[i]} prior is transformed to the entropy scale, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards specific entropy score values are expected from the using the prior.
Figure 29: Beta-proportion GLLAMM, hearing status effects prior distribution: parameter and entropy scale
7.2.2.4 Chronological age per hearing status \beta_{A,HS[i]}
The prior distribution for the Beta-proportion GLLAMM is described in Equation 23. Notably, the same prior is applied to both two hearing status categories. This choice implies that the evolution of potential intelligibility attributed to chronological age between the categories is presumed to have similar uncertainties, prior to the observation of empirical data.
The left panel of Figure 30 shows the weakly informative prior has no particular bias towards a positive or negative evolution of potential intelligibility due to chronological age per hearing status group. Furthermore, the right panel of Figure 30, demonstrate that when transformed to the entropy scale, the model anticipates a slight concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome.
Figure 30: Beta-proportion GLLAMM, chronological age per hearing status effects prior distribution: parameter and entropy scale
7.2.2.5 speaker differences e_{i}
The prior distribution for the Beta-proportion GLLAMM is described in Equation 24. The same prior is assigned to each speakers in the sample. This choice implies that differences in potential intelligibility differences between speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters, called hyperpriors (refer to Section 3.1.5).
The left panel of Figure 31 shows the prior anticipates differences in intelligibility between speakers as large 3 units of measurement. Furthermore, the right panel of Figure 31, demonstrate that when transformed to the entropy scale, the model anticipates a high concentration around mid-levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility between speakers are expected resulting from using this prior.
7.2.2.6 Average within sentence-speaker differences u_{i}
The prior distribution of u_{i} for the Beta-proportion GLLAMM is described in Equation 25. The same prior is assigned to each sentence within each speakers in the sample. This choice implies that the average potential intelligibility differences among sentences within speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5). Next, the within sentence-speaker differences are then aggregated to the speaker level to form the sentence random effects u_{i},
The left panel of Figure 32 shows the prior restricts the average differences in potential intelligibility among sentences within speakers can be as large as 0.8 units of measurement. Furthermore, the right panel of Figure 32, demonstrate that when u_{i} is transformed to the entropy scale, the model anticipates a high concentration of scores around mid-levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility is expected between speakers.
Figure 32: Beta-proportion GLLAMM, average within sentence-speaker differences prior distribution: parameter and entropy scale
7.2.2.7 Random block effect a_{b}
The prior distribution for the Beta-proportion GLLAMM is described in Equation 26. The same prior is assigned to each block. This choice implies that the average entropy scores differences among blocks are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to Section 3.1.5).
The left panel of Figure 33 shows a prior with no particular bias towards positive or negative differences between blocks. Furthermore, the right panel of Figure 33 demonstrate that when transformed to the entropy scale, the model anticipates a high concentration of data around mid levels of entropy, but not beyond the feasible range of the outcome.
After the careful assessment of the prior implications for each parameter, the expected prior distribution for the potential intelligibility can be constructed for the Beta-proportion GLLAMM. The prior predictive simulation can be described as in Equation 27:
The left panel of Figure 34 shows the prior expects speakers’ potential intelligibility scores to be more probable between [-3, 3], implying no particular bias towards positive or negative potential intelligibility is present jointly in these priors. Furthermore, the right panel of Figure 34, demonstrate that when transformed to the entropy scale, the model anticipates prediction of entropy scores within its feasible range, but somewhat more probable in the extremes of entropy.
This study evaluates the comparative predictive capabilities of both the Normal LMM and the Beta-proportion GLLAMM (RQ1) while simultaneously examining various formulations regarding how speaker-related factors influence intelligibility (RQ3). In this context, the predictive capabilities of the models are intricately connected to these formulations. As a result, the study requires fitting 12 different models, each representing a specific manner to investigate one or both research questions. The models comprised six versions of both the Normal LMM and the Beta-proportion GLLAMM. The differences among the models hinged on (1) whether they addressed data clustering in conjunction with measurement error, denoted as the model type, (2) the assumed distribution for the entropy scores, which aimed to handle boundedness, (3) whether the model incorporates a robust feature to address mild or moderate departures of the data from distributional assumptions, and (4) the inclusion or exclusion of speaker-related factors in the models. A detailed overview of the fitted models is available in Table 2.
Table 2: Fitted models.
Model
Entropy
Robust
Fixed effects
Model
type
distribution
feature
\beta_{HS[i]}
\beta_{A}
\beta_{A,HS[i]}
1
LMM
Normal
No
No
No
No
2
LMM
Normal
No
Yes
Yes
No
3
LMM
Normal
No
Yes
No
Yes
4
LMM
Normal
Yes
No
No
No
5
LMM
Normal
Yes
Yes
Yes
No
6
LMM
Normal
Yes
Yes
No
Yes
7
GLLAMM
Beta-prop.
No
No
No
No
8
GLLAMM
Beta-prop.
No
Yes
Yes
No
9
GLLAMM
Beta-prop.
No
Yes
No
Yes
10
GLLAMM
Beta-prop.
Yes
No
No
No
11
GLLAMM
Beta-prop.
Yes
Yes
Yes
No
12
GLLAMM
Beta-prop.
Yes
Yes
No
Yes
The following tabset panel provides the commentated Stan code for all fitted model. Furthermore, the models are implemented using non-centered priors (refer to Section 3.1.5).
The models were estimated using R version 4.2.2 (R Core Team, 2015) and Stan version 2.26.1 (Stan Development Team., 2021). Four Markov chains were implemented for each parameter, each with distinct starting values. Each chain underwent 4,000 iterations, where the first 2,000 served as a warm-up phase and the remaining 2,000 were considered samples from the posterior distribution.
7.5 Chain quality and information
Verification of stationarity, convergence, and mixing for the parameter chains involved graphical analysis and diagnostic statistics. Graphical analysis utilized trace, trace-rank, and autocorrelation plots (ACF). Diagnostic statistics included the potential scale reduction factor statistics\widehat{\text{R}} with a cut-off value of 1.05(A. Vehtari, Gelman, Simpson, Carpenter, & Bürkner, 2021a). Furthermore, to confirm whether the parameters posterior distributions were generated with a sufficient number of uncorrelated sampling points, each posterior distribution density plot was inspected along with their effective sample size statistics n_{\text{eff}}(Gelman et al., 2014).
7.6 Model comparison
The study compares the fitted models using three criteria: the deviance information criterion (DIC) by Spiegelhalter et al. (Spiegelhalter, Best, Carlin, & van der Linde, 2002), the widely applicable information criterion (WAIC) by Watanabe (2013), and the Pareto Smoothing Importance Sampling criterion (PSIS) by Vehtari et al. (2017). These criteria score models in terms of deviations from perfect predictive accuracy, with smaller values indicating less deviation (McElreath, 2020). Specifically, DIC measures in-sample deviations, while WAIC and PSIS offer an approximate measure of out-of-sample deviations. Deviations from perfect predictive accuracy serve as the closest estimate for the Kullback-Leibler divergence (Kullback & Leibler, 1951), which measures the degree to which a model accurately represents the true distribution of the data. Moreover, WAIC and PSIS are considered full Bayesian criteria as they incorporate all the information encompassed in the parameter’s posterior distribution. This effectively integrates and reports the inherent uncertainty in the predictive accuracy estimates. Predictive accuracy aside, PSIS offers an additional advantage in identifying highly influential data points. To achieve this, the criterion uses a built-in warning system that flags observations that make out-of-sample predictions unreliable. The key intuition is that observations that are relatively unlikely, according to the model, exert more influence and render predictions more unreliable than those relatively expected (McElreath, 2020).
8 Results
This section presents the results of the Bayesian inference procedures, with particular emphasis in answering the three research questions.
The posterior estimates of the models are loaded in the following manner. file_id() is a user-defined function that identifies the stanfit generated files within a particular directory.
8.1 Predictive capabilities of the Beta-proportion GLLAMM compared to the Normal LMM (RQ1)
This research question evaluates the effectiveness of the Beta-proportion GLLAMM in handling the features of entropy scores by comparing its predictive accuracy to the Normal LMM. Models 1, 4, 7, and 10 are specifically chosen for this comparison because their assumptions exclusively address the features of the scores, without integrating additional covariate information. As detailed in Table 2, Model 1 is a Normal LMM that solely addresses data clustering. Building upon this, Model 4 introduces a robust feature. Conversely, Model 7 is a Beta-proportion GLLAMM that deals with boundedness, measurement error and data clustering, and Model 10 extends this model by incorporating a robust feature.
Figure 35 displays values for the DIC, WAIC, and PSIS. They also include the components dWAIC and dPSIS, highlighting the differences in out-of-sample deviations from the best-fitting model and its associated uncertainty. The associated tables provide similar information, while also reporting the pWAIC and pPSIS values, indicating the penalization received by the models for their complexity (roughly associated with their number of parameters). Lastly, the tables show the weight of evidence, which summarizes the relative support for each model.
Overall, all criteria consistently point to Model 10 as the most plausible choice for the data. The model exhibits the lowest values for both WAIC and PSIS, establishing itself as the model with the least deviation from perfect predictive accuracy among those under comparison. Additionally, Figure 35 visually demonstrates the non-overlapping uncertainty (horizontal blue lines) in both dWAIC and dPSIS values for Models 1, 4, and 7 when compared to Model 10. This indicates that Model 10 significantly deviates the least from perfect predictive accuracy when compared to the rest of the models. Lastly, the weight of evidence in the tables underscores that 100\% of the evidence aligns with and supports Model 10.
user defined function: plot of Deviance, WAIC, PSIS, and dWAIC with confidence intervals
Figure 35: WAIC and PSIS model comparison plot. Note: Black and blue points describe point estimates, and continuous horizontal lines indicate the associated uncertainty.
Upon closer examination, the reasons behind the observed disparities in the models become more apparent. Specifically, Figure 36 highlights that the Normal LMM, as outlined in Model 4, fails to capture the underlying data patterns, resulting in predictions that are physically inconsistent, falling outside the outcome’s range between zero and one. Further insight into this issue is provided by Figure 37 and Figure 39. Figure 37 displays Model 4’s score prediction densities which bear no resemblance to the actual data densities. Furthermore, the top two panels in Figure 39 reveal that misspecification in the Normal LMM causes the model to be more surprised by ‘extreme’ entropy scores, leading to their identification as highly unlikely and influential observations. Consequently, the model is rendered unreliable due to the potential biases present in the parameter estimates. In contrast, the Beta-proportion GLLAMM appears to effectively capture the data patterns, generating predictions within the expected data range. This is evident in Figure 36 and complemented by Figure 38 and Figure 39. In Figure 38, Model 10 display prediction densities that bear more resemblance to the actual data densities. Furthermore, the bottom two panels in Figure 39 show the model is less surprised by ‘extreme’ scores, fostering more trust in the model’s estimates.
user defined function: plot entropy scores and predictions for selected models
Figure 36: Entropy scores prediction for selected models. Note: Black dots show manifest entropy scores, orange dots and vertical lines show the point estimates and 95% highest probability density interval (HPDI) derived from Model 4, blue dots and vertical lines show similar information for Model 10.
user defined function: entropy and predicted scores density plot for selected model
Figure 37: Model 4: Entropy scores density for selected speakers. Note: Black bars denote the true data density, orange bars describe the predicted data density
user defined function: entropy and predicted scores density plot for selected model
Figure 38: Model 10: Entropy scores density for selected speakers. Note: Black bars denote the true data density, blue bars describe the predicted data density
user defined function: outliers identification for selected model
Figure 39: Outlier identification and analysis for selected models. Note: Thin and thick vertical discontinuous line indicate threshold of 0.5 and 0.7, respectively. Number pair texts indicate the observation pair of speaker and sentence index.
8.2 Estimation of speakers’ latent potential intelligibility from manifest entropy scores (RQ2)
The second research question aimed to demonstrate the application of the Beta-proportion GLLAMM in estimating the latent potential intelligibility of speakers. This was achieved by employing the general mathematical formalism outlined in Equation 9, along with additional specifications provided in Table 2. The Bayesian procedure successfully estimated the latent potential intelligibility of speakers under Model 10 through the structural equation:
Moreover, due to its implementation under Bayesian procedures, Model 10 provides the complete posterior distribution of the speakers’ potential intelligibility scores. This provision, in turn, (1) enables the calculation of summaries, facilitating the ranking of individuals, and (2) supports the assessment of differences among selected speakers. In both cases, the model considers the inherent uncertainty of the estimates resulting from its measurement using multiple entropy scores.
Figure 40 and the associated table display the ranking of speakers in decreasing order based on point estimates of the latent potential intelligibility. These estimates are accompanied by their associated 95\% highest probability density intervals (HPDI). Both the table and figure clearly indicate that speaker 6 stands out as the least intelligible in the sample, followed farther behind by speaker 1, 17 and 9. In contrast, the figure highlights speaker 20 as the most intelligible, closely followed by speakers 23, 31 and 3. Conversely, Figure 41 and its associated table show summaries and the full posterior distribution for the comparison of potential intelligibility among selected speakers. The table and figure reveal that only the differences between speakers 6, 1, 17, and 9, along with the difference between speakers 20 and 3 are statistically significant, as their associated 95\% HPDI did not overlap with zero (shaded area).
Code
SI =pred_SI(d=data_H, stanfit_obj=model10, p=0.95)SI = SI[order(SI$mean, decreasing=T), ]SI[,c(1:5,9:10)]
1
user-defined function: retrieves SI scores for selected models
user defined function: plot ordered potential intelligibility score for speakers
Figure 40: Model 10, latent potential intelligibility of speakers. Note: Black dots and vertical lines show mean point estimates and 95% HPDI intervals.
require(rethinking)par(mfrow=c(2,3))for(i in idx_comp){dens( SI_contr$SI_raw[[i]], xlim=c(-2.5,2.5),col=rgb(0,0,0,0.7), show.HPDI=0.95,xlab='Difference in potential intelligibility')abline( v=0, lty=2, col=rgb(0,0,0,0.3))mtext( text=names(SI_contr$SI_raw)[i], side=3, adj=0, cex=1.1)}par(mfrow=c(1,1))
1
density plot for the differences in potential intelligibility between selected speakers
Figure 41: Model 10, potential intelligibility comparisons among selected speakers. Note: Shaded area describes the 95% highest probability density interval (HPDI)
8.3 Testing the influence of speaker-related factors on intelligibility (RQ3)
This research question illustrates how theories on intelligibility can be examined within the model’s framework. Specifically, the focus centers on assessing the influence of speaker-related factors on intelligibility, such as chronological age and hearing status. Notably, despite RQ1 indicating the suitability of Beta-proportion GLLAMM models for entropy scores, existing statistical literature suggests that, in certain scenarios, models incorporating covariate adjustment exhibit robustness to misspecification in the functional form linking an outcome and covariates, commonly referred to as covariate-outcome relationship (Tackney et al., 2023). Consequently, this study compares all models detailed in Table 2. These models are characterized by different covariate adjustments on entropy scores or the latent potential intelligibility of speakers, namely chronological age and hearing status, while potentially exhibiting misspecification in the covariate-outcome relationship, as observed in the case of the Normal LMM.
Similar to RQ1, all criteria consistently identify the Beta-proportion GLLAMM outlined in models 11, 12 and 10 as the most plausible models for the data. The models exhibit the lowest values for both WAIC and PSIS, establishing them as the least deviating models among those under comparison. Moreover, Figure 42 depicts with horizontal blue lines the non-overlapping uncertainty for the models’ dWAIC and dPSIS values. This reveals that, when compared to Model 11, most models exhibit significantly distinct predictive capabilities. Models 12 and 10, however, stand out as exceptions to this pattern. This observation suggests that Models 11, 12, and 10 display the least deviation from perfect predictive accuracy in contrast to the other models. Lastly, the weight of evidence in the tables, underscores that Model 11 accumulated the greatest support, followed by Model 12, and lastly, by Model 10.
user defined function: plot of Deviance, WAIC, PSIS, and dWAIC with confidence intervals
Figure 42: WAIC and PSIS model comparison plot. Note: Black and blue points describe point estimates, and continuous horizontal lines indicate the associated uncertainty.
A closer examination of two models within this comparison set reveal the reasons behind the largest observed disparities. The Normal LMM, as outlined in Model 6, continues to face challenges in capturing underlying data patterns, resulting in predictions that are physically inconsistent, falling outside the outcome’s range. Additionally, the model persists in identifying highly unlikely and influential observations, making it inherently unreliable. In contrast, the Beta-proportion GLLAMM described by Model 12 appears to be less susceptible to ‘extreme’ scores, effectively capturing data patterns within the expected data range and thereby instilling greater confidence in the reliability of the model’s estimates. This contrast is visually depicted in Figure 43, Figure 44, Figure 45, and Figure 46.
Figure 43: Entropy scores prediction for selected models. Note: Black dots show manifest entropy scores, orange dots and vertical lines show the point estimates and 95% highest probability density intervals (HPDI) derived from model 6, blue dots and vertical lines show similar information for model 12.
user defined function: plot entropy scores and two selected models
user defined function: entropy and predicted scores density plot for selected model
Figure 44: Model 6: Entropy scores density for selected speakers. Note: Black bars denote the true data density, orange bars describe the predicted data density
user defined function: entropy and predicted scores density plot for selected model
Figure 45: Model 12: Entropy scores density for selected speakers. Note: Black bars denote the true data density, blue bars describe the predicted data density
user defined function: outliers identification for selected model
Figure 46: Outlier identification and analysis for selected models. Note: Thin and thick vertical discontinuous line indicate threshold of 0.5 and 0.7, respectively. Number pair texts indicate the observation pair of speaker and sentence index.
Considering the results in Figure 42, the model comparisons favor three distinct models: Model 10, 11 and 12. Model 10, supported by 20.4\% of the evidence, estimates a single intercept \alpha and no slope to explain the potential intelligibility of speakers (refer to associated table). In contrast, supported by 45.1\% of the evidence, Model 11 estimates distinct intercepts for each hearing status group, namely \alpha_{HS[1]} for NH speakers and \alpha_{HS[2]} for the HI/CI counterparts, while maintaining a single slope that gauges the impact of age on potential intelligibility estimates. The 95\% HPDI for the comparison of intercepts \alpha_{HS[2]}-\alpha_{HS[1]} reveal significant differences between NH and HI/CI speakers. Lastly, with evidence of 34.1\%, Model 12 estimates one intercept and slope per hearing status group, namely \alpha_{HS[1]} and \beta_{A,HS[1]} for the NH speakers, and \alpha_{HS[2]} and \beta_{A,HS[2]} for the HI/CI counterparts. The 95\% HPDI for the comparison of intercepts and slopes reveal significant differences solely in the slopes between NH and their HI/CI counterparts (\beta_{A,HS[2]}-\beta_{A,HS[1]}).
However, a discerning reader can notice that these models yield conflicting conclusions regarding the influence of chronological age and hearing status on intelligibility. Model 10 implies no influence of chronological age and hearing status on the potential intelligibility of speakers. A visual inspection of Figure 47, however, reveals the reason for the model’s low support. Model 10 fails to capture the prevalent increasing age pattern observed in potential intelligibility estimates. In contrast, Model 11 identifies significant differences in potential intelligibility between NH and HI/CI speakers. The model further suggests that with the progression of chronological age, HI/CI speakers lag behind in intelligibility development, with no opportunity to catch up to their NH counterparts within the analyzed age range, as depicted in Figure 48. Finally, Model 12 indicates no significant differences in intelligibility between NH and HI/CI speakers at 68 months of age (around 6 years old). However, the model reveals distinct evolution patterns of intelligibility per unit of chronological age between different hearing status groups, with HI/CI speakers displaying a slower rate of development compared to their NH counterparts within the analyzed age range. The latter is evident in Figure 49.
user defined function: plot potential intelligibility per age and hearing status for selected model
Figure 47: Model 10, Potential intelligibility per chronological age and hearing status. Note: Colored dots denote mean point estimates, vertical lines describe the 95% highest probability density intervals (HPDI), thick discontinuous line indicate the regression line, thin continuous lines denote regression lines samples from the posterior distribution, and numbers indicate the speaker index.
user defined function: plot potential intelligibility per age and hearing status for selected model
Figure 48: Model 11, Potential intelligibility per chronological age and hearing status. Note: Colored dots denote mean point estimates, vertical lines describe the 95% highest probability density intervals (HPDI), thick discontinuous line indicate the regression line, thin continuous lines denote regression lines samples from the posterior distribution, and numbers indicate the speaker index.
user defined function: plot potential intelligibility per age and hearing status for selected model
Figure 49: Model 12, Potential intelligibility per chronological age and hearing status. Note: Colored dots denote mean point estimates, vertical lines describe the 95% highest probability density intervals (HPDI), thick discontinuous line indicate the regression line, thin continuous lines denote regression lines samples from the posterior distribution, and numbers indicate the speaker index.
8.4 Chain quality and information
Given the considerable number of fitted models and the resulting abundance of parameters, this section opted to exclusively showcase the quality and information embedded in the Bayesian chains through models 6 and 12. The selection of these models is grounded in their parameter counts, with both registering the highest among those detailed in Section 7.3. It is crucial to underscore that a meticulous examination of all fitted models was conducted. Notably, all models demonstrated comparable results to those specifically chosen for illustrative purposes.
In general, both graphical analysis and diagnostic statistics indicated that all chains exhibited low to moderate autocorrelation, explored the parameter space in a seemingly random manner, and converged to a constant mean and variance in their post-warm-up phase. Figure 50 visualizes the \widehat{\text{R}} diagnostic statistic and Figure 51 through Figure 63 illustrate the chain’s graphical analysis.
used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model
Figure 63: Model 12, trace, trace rank and ACF plots for selected parameters
Moreover, the density plots and n_{\text{eff}} statistics collectively confirmed that all posterior distributions are unimodal distributions with values centered around a mean, generated with a satisfactory number of uncorrelated sampling points, making substantive sense compared to the models’ prior beliefs. Figure 64 visualizes the n_{\text{eff}} diagnostic statistic and Figure 66 through Figure 71 illustrate the chains’ graphical analysis.
used-defined function: generation of density plot with HPDI for selected parameters
Figure 71: Model 12, density plots for selected parameters
9 Discussion
9.1 Findings
This study examined the suitability of the Bayesian Beta-proportion GLLAMM for the quantitative measuring and testing of research theories related to speech intelligibility using entropy scores. The initial findings supported the assertion that Beta-proportion GLLAMMs consistently outperformed Normal LMMs in predicting entropy scores, underscoring its superior predictive performance. The results emphasized that models neglecting the outcomes’ measurement error and boundedness lead to underfitting and misspecification issues, even when robust features are integrated. This is clearly illustrated by the Normal LMMs.
Secondly, the study showcased the Beta-proportion GLLAMM’s proficiency in estimating the latent potential intelligibility of speakers based on manifest entropy scores. Implemented under Bayesian procedures, the proposed model offered a valuable advantage over frequentist methods by further providing the full posterior distribution of the speakers’ potential intelligibility. This provision facilitated the calculation of summaries, aiding individual rankings, and supported the comparisons among selected speakers. In both scenarios, the proposed model accounted for the inherent uncertainty in the intelligibility estimates.
Thirdly, the study illustrated how the proposed model assessed the impact of speaker-related factors on potential intelligibility. The results suggested that multiple models were plausible for the observed entropy scores, indicating that different speaker-related factor theories were viable for the data, with some presenting contradictory conclusions about the influence of those factors on intelligibility. However, even when unequivocal support for one theory was not possible, the divided support among these models informed that certain statistical issues may be hindering the model’s ability to distinguish among individuals and, ultimately, among models. These issues encompassed the insufficient sample size of speakers, the inadequate representation of the population of speakers, and the imprecise measurement of the latent variable of interest.
Ultimately, this study introduced researchers to innovative statistical tools that enhanced existing research models. These tools not only assessed the predictability of empirical phenomena but also quantitatively measured the latent trait of interest, namely potential intelligibility, facilitating the comparison of research theories related to this trait. However, the presented tools introduce new challenges for researchers seeking their implementation. These challenges emerge from two distinct aspects: one methodological and the other practical. In the methodological domain, researchers need familiarity with Bayesian methods and the principled formulation of assumptions regarding the data-generating processes and research inquiries. This entails understanding and addressing each of the data and research challenges within the context of a statistical (probabilistic) model. Conversely, in the practical domain, researchers need familiarity with probabilistic programming languages (PPLs), which are designed for specifying and obtaining inferences from probabilistic models -the core of Bayesian methods. To ensure the successful utilization of this new statistical tool, this study addresses both challenges by providing comprehensive, step-by-step guidance in the form of this digital walk-through document.
9.2 Limitations and future research
This study provides valuable insights into the use of a novel approach to simultaneously address the different data features of entropy scores in speech intelligibility research. However, it is important to acknowledge the limitations of this study and explore potential avenues for future research.
Firstly, the study interprets potential intelligibility as an unobserved latent trait of speakers influencing the likelihood of observing a set of entropy scores. These scores, in turn, reflect the transcribers’ ability to decode words in sentences produced by the same speakers. Despite this practical approach, the construct validity of the latent trait heavily depends on the listeners’ appropriate understanding and execution of the transcription task. Construct validity, as defined by Cronbach and Meehl (1955), refers to the extent to which a set of manifest variables accurately represents a concept that cannot be directly measured. Considering the study assumes the transcription task set by Boonen and colleagues (Boonen et al., 2021) was properly understood and executed, it expects that potential intelligibility reflects the overall speech intelligibility of speakers. However, this study does not delve into the general epistemological considerations regarding the connection between the latent variable and the concept.
Secondly, the study identified a notable absence of unequivocal support for one of the compared models. This deficiency may be attributed to factors such as the insufficient sample size of speakers, the inadequate representation of the populations of speakers (referred to as selection bias), and the imprecise measurement of the latent variable. Insufficient sample size and selection bias yield data with limited outcome and covariates ranges, leading to biased and imprecise parameter estimates (Everitt & Skrondal, 2010). Furthermore, these issues, exacerbated by reduced measurement precision, can result in models with diminished statistical power and a higher risk of type I or type II errors (McElreath, 2020). Consequently, future research should consider conducting power analyses for the proposed models. This entails assessing the impact of expanding the speakers’ pool on testing research theories, or increasing the number of speech samples, transcriptions, and listeners to enhance the precision of potential intelligibility estimates. With these insights, future investigations should contemplate increasing the speaker sample with a group that adequately represents the population of interest. However, this must be done while mindful of the pragmatic limitations associated with transcription tasks, specifically considering the costs and time-intensiveness of the procedure.
Thirdly, the study presented an illustrative example for the investigation of research theories within the model’s framework. However, it did not offer an exhaustive evaluation of all factors influencing intelligibility, which are thoroughly explored in the works of Boons et al. (2012), Fagan et al. (2020), Gillis (2018), and Niparko et al. (2010). Consequently, the study cannot discard the presence of unobservable variables that might bias the parameter estimates, potentially impacting the inferences provided. Hence, future research should consider integrating appropriate causal hypotheses about these factors into the proposed models, as proper covariate adjustment facilitates the production of unbiased and precise parameter estimates (Cinelli, Forney, & Pearl, 2022; Deffner, Rohrer, & McElreath, 2022).
Lastly, this study proposes two directions for future exploration in speech intelligibility research. Firstly, there is an opportunity to investigate alternative methods for assessing speech intelligibility beyond transcription tasks and entropy scores. The experimental design of transcription tasks imply that the procedure may be time-intensive and costly. Thus, exploring less time-intensive or more cost-effective procedures, that still offer comparable precision in intelligibility estimates, could benefit both researchers and speech therapists alike. An illustrative example of such a method is Comparative Judgment (CJ), where judges compare and score the perceived intensity of a trait between two stimuli (Thurstone, 1927). In the context of the intelligibility trait, the stimuli under assessment could be the speech samples uttered by two speakers. Nevertheless, CJ serve as an ideal example as the method has gained increasing attention within the realm of educational assessment, with several studies providing evidence for its validity in assessing various task within student works, as demonstrated by examples in Pollit (2012a, 2012b), Lesterhuis (2018), van Daal (2020) and Verhavert et al. (2019).
Conversely, a second avenue for exploration involves integrating diverse data types and evaluation methods to assess individuals’ intelligibility. This can be accomplished by leveraging two features of Bayesian methods: their flexibility and the concept of Bayesian updating. Bayesian methods possess the flexibility to simultaneously handle various data types. Additionally, through Bayesian updating, researchers can integrate information from the posterior distribution of parameters as priors in models for subsequent evaluations. Ultimately, this could enable researchers to assess speakers’ intelligibility progress without committing to a specific data type or evaluation method. This advancement could mirror the emergence of second-generation Structural Equation Models proposed by Muthen (Muthén, 2001), where models facilitate the combined estimation of categorical and continuous latent variables. However, in the context of future research, the proposal would facilitate the estimation of latent variables using a combination of data types and evaluation methods, contingent upon the fulfillment of construct validity by those evaluation methods.
10 Conclusions
This study highlights the effectiveness of the Bayesian Beta-proportion GLLAMM to collectively address several key data features when investigating unobservable and complex traits, using speech intelligibility and entropy scores as an example. The results demonstrate the proposed model consistently outperforms the Normal LMM in predicting the empirical phenomena. Moreover, it exhibits the ability to quantify the latent potential intelligibility of speakers, allowing for the ranking and comparison of individuals based on the latent trait while accommodating associated uncertainties. Additionally, the proposed model facilitates the exploration of research theories concerning the influence of speaker-related factors on potential intelligibility. The study indicates that integrating and comparing these theories within the model’s framework is a straightforward task. However, the introduction of these innovative statistical tools presents new challenges for researchers seeking implementation. These challenges encompass the principled formulation of assumptions about the data-generating processes and research inquiries, along with the need for familiarity with probabilistic programming languages (PPLs) essential for implementing Bayesian methods. Nevertheless, the study suggests several promising avenues for future research, including power analysis, causal hypothesis formulation, and exploration and integration of novel evaluation methods for assessing intelligibility. The insights derived from this study hold implications for both researchers and data analysts interested in quantitatively measuring and testing theories related to nuanced, unobservable constructs, while also considering the appropriate prediction of the empirical phenomena.
Baker, F. (1998). An investigation of the item parameter recovery characteristics of a gibbs sampling procedure. Applied Psychological Measurement, 22(22), 153–169. https://doi.org/10.1177/01466216980222005
Baldwin, S., & Fellingham, G. (2013). Bayesian methods for the analysis of small sample multilevel data with a complex variance structure. Journal of Psychological Methods, 18(2), 151–164. https://doi.org/10.1037/a0030642
Boonen, N., Kloots, H., Nurzia, P., & Gillis, S. (2021). Spontaneous speech intelligibility: Early cochlear implanted children versus their normally hearing peers at seven years of age. Journal of Child Language, 1–26. https://doi.org/10.1017/S0305000921000714
Boons, T., Brokx, J., Dhooge, I., Frijns, J., Peeraer, L., Vermeulen, A., … van Wieringen, A. (2012). Predictors of spoken language development following pediatric cochlear implantation. Ear and Hearing, 33(5), 617–639. https://doi.org/10.1097/AUD.0b013e3182503e47
Brito Trindade, P. L. A. P. V. de, Daniele AND Espinheira. (2021). Beta regression model nonlinear in the parameters with additive measurement errors in variables. PLOS ONE, 16(7), 1–28. https://doi.org/10.1371/journal.pone.0254103
Brooks, S., Gelman, A., Jones, G., & Meng, X. (2011). Handbook of markov chain monte carlo (1st ed.). Chapman; Hall, CRC. https://doi.org/10.1201/b10905
Chin, S., Bergeson, T., & Phan, J. (2012). Speech intelligibility and prosody production in children with cochlear implants. Journal of Communication Disorders, 45, 355–366. https://doi.org/10.1016/j.jcomdis.2012.05.003
Cronbach, L., & Meehl, P. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957
Deffner, D., Rohrer, J., & McElreath, R. (2022). A causal framework for cross-cultural generalizability. Advances in Methods and Practices in Psychological Science, 5(3). https://doi.org/10.1177/25152459221106366
Denwood, M. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71(9), 1–25. https://doi.org/10.18637/jss.v071.i09
Depaoli, S. (2014). The impact of inaccurate “informative” priors for growth parameters in bayesian growth mixture modeling. Journal of Structural Equation Modeling, 21, 239–252. https://doi.org/10.1080/10705511.2014.882686
Depaoli, S., & van de Schoot, R. (2017). Improving transparency and replication in bayesian statistics: The WAMBS-checklist. Psychological Methods, 22(2), 240–261. https://doi.org/10.1037/met0000065
Fagan, M., Eisenberg, L., & Johnson, K. (2020). Investigating early pre-implant predictors of language and cognitive development in children with cochlear implants. In M. Marschark & H. Knoors (Eds.), Oxford handbook of deaf studies in learning and cognition (pp. 46–95). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190054045.013.3
Flipsen, P. (2006). Measuring the intelligibility of conversational speech in children. Clinical Linguistics & Phonetics, 20(4), 303–312. https://doi.org/10.1080/02699200400024863
Freeman, V., Pisoni, D., Kronenberger, W., & Castellanos, I. (2017). Speech intelligibility and psychosocial functioning in deaf children and teens with cochlear implants. Journal of Deaf Studies and Deaf Education, 22(3), 278–289. https://doi.org/10.1093/deafed/enx001
Gabry, J., & Češnovar, R. (2022). Cmdstanr: R interface to ’CmdStan’.
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., & Rubin, D. (2014). Bayesian data analysis (3rd ed.). Chapman; Hall/CRC.
Gillis, S. (2018). Speech and language in congenitally deaf children with a cochlear implant. In E. Dattner & D. Ravid (Eds.), Handbook of communication disorders: Theoretical, empirical, and applied linguistic perspectives (pp. 765–792). De Gruyter Mouton. https://doi.org/10.1515/9781614514909-038
Kangmennaang, J., Siiba, A., & Bisung, E. (2023). Does trust mediate the relationship between experiences of discrimination and health care access and utilization among minoritized canadians during COVID-19 pandemic? Journal of Racial and Ethnic Health Disparities. https://doi.org/10.1007/s40615-023-01809-w
Kent, R. D., Miolo, G., & Bloedel, S. (19943). The intelligibility of children’s speech: A review of evaluation procedures. American Journal of Speech-Language Pathology, 3(2), 81–95. https://doi.org/10.1044/1058-0360.0302.81
Kim, S., & Cohen, A. (1999). Accuracy of parameter estimation in gibbs sampling under the two-parameter logistic model. Annual Meeting of the American Educational Research Association. American Educational Research Association. Retrieved from https://eric.ed.gov/?id=ED430012
Kullback, S., & Leibler, R. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. Retrieved from http://www.jstor.org/stable/2236703
Lagerberg, T., Asberg, J., Hartelius, L., & Persson, C. (2014). Assessment of intelligibility using children’s spontaneous speech: Methodological aspects. International Journal of Language and Communication Disorders, 49(2), 228–239. https://doi.org/10.1111/1460-6984.12067
Lambert, P., Sutton, A., Burton, P., Abrams, K., & Jones, D. (2006). How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS. Journal of Statistics in Medicine, 24(15), 2401–2428. https://doi.org/10.1002/sim.2112
Lee, Y., & Nelder, J. A. (1996). Hierarchical generalized linear models. Journal of the Royal Statistical Society: Series B (Methodological), 58(4), 619–656. https://doi.org/10.1111/j.2517-6161.1996.tb02105.x
Lesterhuis, M. (2018). The validity of comparative judgement for assessing text quality: An assessor’s perspective (PhD thesis). University of Antwerp.
McElreath, R. (2021). Rethinking: Statistical rethinking book package.
Muthén, B. (2001). Second-generation structural equation modeling with a combination of categorical and continuous latent variables: New opportunities for latent class–latent growth modeling. In L. Collins & A. Sayer (Eds.), New methods for the analysis of change (pp. 291–322). American Psychological Association. https://doi.org/10.1037/10409-010
Neal, R. (2003). Slice sampling. The Annals of Statistics, 31(3), 705–741. https://doi.org/)
Niparko, J., Tobey, E., Thal, D., Eisenberg, L., Wang, N., Quittner, A., & Fink, N. (2010). Spoken language development in children following cochlear implantation. JAMA, 303(15), 1498–1506. https://doi.org/10.1001/jama.2010.451
Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6(1), 7–11. Retrieved from https://journal.r-project.org/archive/
Pollitt, A. (2012a). Comparative judgement for assessment. International Journal of Technology and Design Education, 22(2), 157--170. https://doi.org/10.1007/s10798-011-9189-x
Pollitt, A. (2012b). The method of adaptive comparative judgement. Assessment in Education: Principles, Policy and Practice, 19(3), 281--300. https://doi.org/10.1080/0969594X.2012.665354
R Core Team. (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org/
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modeling. Psychometrika, 69(2), 167–190. https://doi.org/https://www.doi.org/10.1007/BF02295939
Seaman, S. jr., J., & Stamey, J. (2011). Hidden dangers of specifying noninformative priors. The American Statistician, 66(2), 77–84. https://doi.org/10.1080/00031305.2012.695938
Shmueli, G., & Koppius, O. (2011). Predictive analytics in information systems research. MIS Quarterly, 35(3), 553–572. https://doi.org/10.2307/23042796
Spiegelhalter, D., Best, N., Carlin, B., & van der Linde, A. (2002). Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society Series B: Statistical Methodology, 64(4), 583–639. https://doi.org/10.1111/1467-9868.00353
Stan Development Team. (2020). RStan: The R interface to Stan. Retrieved from http://mc-stan.org/
Stan Development Team. (2021). Stan modeling language users guide and reference manual, version 2.26. Vienna, Austria. Retrieved from https://mc-stan.org
Tackney, M., Morris, T., White, I., Leyrat, C., Diaz-Ordaz, K., & Williamson, E. (2023). A comparison of covariate adjustment approaches under model misspecification in individually randomized trials. Trials, 24(14). https://doi.org/10.1186/s13063-022-06967-6
van Daal, T. (2020). Making a choice is not easy?!: Unravelling the task difficulty of comparative judgement to assess student work (PhD thesis). University of Antwerp.
van Heuven, V. (2008). Making sense of strange sounds: (Mutual) intelligibility of related language varieties. A review. International Journal of Humanities and Arts Computing, 2(1-2), 39–62. https://doi.org/10.3366/E1753854809000305
Vehtari, A., Gabry, J., Magnusson, M., Yao, Y., Bürkner, P., Paananen, T., & Gelman, A. (2023). Loo: Efficient leave-one-out cross-validation and WAIC for bayesian models. Retrieved from https://mc-stan.org/loo/
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432. https://doi.org/10.1007/s11222-016-9696-4
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, PC. (2021a). Rank-Normalization, Folding, and Localization: An Improved \widehat{R} for Assessing Convergence of MCMC (with Discussion). Bayesian Analysis, 16(2), 667–718. https://doi.org/10.1214/20-BA1221
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., & Gabry, J. (2021b). Pareto smoothed importance sampling. Retrieved from https://arxiv.org/abs/1507.02646
Verhavert, S., Bouwer, R., Donche, V., & De Maeyer, S. (2019). A meta-analysis on the reliability of comparative judgement. Assessment in Education: Principles, Policy and Practice, 26(5), 541–562. https://doi.org/10.1080/0969594X.2019.1602027
Whitehill, T., & Chau, C. (2004). Single-word intelligibility in speakers with repaired cleft palate. Clinical Linguistics and Phonetics, 18, 341–355. https://doi.org/10.1080/02699200410001663344
Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software, 21(12), 1–20. Retrieved from http://www.jstatsoft.org/v21/i12/
Wickham, Hadley, Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, Hadley, François, R., Henry, L., Müller, K., & Vaughan, D. (2023). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr
Footnotes
For a thorough explanation of the Bayesian inferences procedures the reader can refer to Kruschke (2015) or McElreath (2020).↩︎
The reader can refer to Brooks et al. (2011) for a detailed treatment on MCMC methods.↩︎
An interested reader can further refer to McElreath (2020) for a detailed explanation of grid approximation.↩︎
An interested reader can refer to McElreath (2020), Gorinova et al. (2019) and Neal (2003)↩︎
Source Code
---title: 'Walk-through'subtitle: 'Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores'author: - name: Rivera Espejo, Jose (corresponding) email: JoseManuel.RiveraEspejo@uantwerpen.be url: https://www.uantwerpen.be/en/staff/jose-manuel-rivera-espejo_23166/ attributes: corresponding: true affiliations: - name: University of Antwerp department: Training and Education Sciences city: Antwerp country: Belgium postal-code: 2000 - name: De Maeyer, Sven email: sven.demaeyer@uantwerpen.be url: https://www.uantwerpen.be/en/staff/sven-demaeyer/ attributes: corresponding: false affiliations: - name: University of Antwerp department: Training and Education Sciences city: Antwerp country: Belgium postal-code: 2000 - name: Gillis, Steven email: steven.gillis@uantwerpen.be url: https://www.clips.uantwerpen.be/~gillis/index.html attributes: corresponding: false affiliations: - name: University of Antwerp department: Computational Linguistics, and Psycholinguistics Research Centre city: Antwerp country: Belgium postal-code: 2000date: todaybibliography: bibliography.bib# geometry:# - left=1.0in# - textwidth=4.5in# - marginparsep=2in# - marginparwidth=2.25inexecute: cache: true echo: true warning: false error: falseformat: html: cite-method: citeproc csl: apa-6th-edition.csl keep-tex: true embed-resources: true link-external-icon: true link-external-newwindow: true page-layout: article theme: light: materia dark: darkly fontsize: 11pt smooth-scroll: true toc: true toc-depth: 3 toc-expand: 1 toc-title: 'Contents' toc-location: left number-sections: true # number-depth: 3 anchor-sections: true html-math-method: katex citations-hover: true footnotes-hover: true reference-location: document code-fold: true code-summary: "Code" code-overflow: scroll code-tools: true code-annotations: hover code-line-numbers: false code-copy: true tbl-cap-location: top fig-cap-location: top---# AimThe purpose of this walk-through is to improve the transparency and replicability of the analysis for the study **Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores** (in press.). This digital document contains all the code and materials utilized in the study. Furthermore, the walk-through meticulously follows the *When-to-Worry-and-How-to-Avoid-the-Misuse-of-Bayesian-Statistics checklist (WAMBS checklist)* developed by Depaoli & van de Schoot [-@Depaoli_et_al_2017]. This checklist outlines the ten crucial points that need careful scrutiny when employing Bayesian inference procedures. ::: {.column-margin}**_WAMBS checklist_** Questionnaire outlining the ten crucial points that need careful scrutiny when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of the analysis [@Depaoli_et_al_2017].:::# OrganizationIn this walk-through, @sec-interludes introduce various background topics that are relevant to the present study. These topics enable readers to progress smoothly through this research. Specifically, @sec-interlude1 provides a brief explanation of how Bayesian inference procedures work and their importance for this research. @sec-interlude2 is devoted to explaining the difference between two particular distributions, the normal and the beta-proportion distribution, and their role on modeling bounded data. @sec-interlude3 explain the (generalized) linear mixed models, elaborating on their role in modeling (non)normal clustered and bounded data. @sec-interlude4 illustrate the concept of measurement error and the role of latent variables to overcome the problems arising from it. Lastly, @sec-interlude5 explains the effects of the data distributional departures on the parameter estimates, and its importance for this research.The specific analysis for this study are elaborated from section @sec-introduction onwards. Particularly, @sec-introduction elaborates on the general context, gaps and main purpose of the study. @sec-RQs introduces the research questions that guide this study. @sec-data explores the data and its implications. @sec-methods thoroughly develop the methods to analyze the data. @sec-results provides answers to the research question at hand. @sec-discussion discuss the findings, limitations and future research derived from this study. Lastly, @sec-conclusions provides the concluding thoughts for the study. The R packages utilized in the production of this document can be divided in three groups. First, the packages utilized to generate this document: `RColorBrewer`[@RColorBrewer_2022] and `quarto`[@Quarto_2022]. Second, the packages used for the handling the data: `stringr`[@stringr_2022], `dplyr`[@dplyr_2023], `tidyverse`[@tidyverse_2019], and `reshape2`[@reshape_2007]. Lastly, the packages used for the Bayesian implementation: `coda`[@coda_2006], `loo`[@loo_2023; @PSIS_2021], `cmdstanr`[@cmdstanr_2022], `rstan`[@RStan_2020], `runjags`[@runjags_2016], and `rethinking`[@rethinking_2021].# Interludes {#sec-interludes}## Bayesian inference {#sec-interlude1}### Theory {#sec-bayesian_theory}Bayesian inference is an approach to statistical modeling and inference that is primarily based on the *Bayes' theorem*. The procedure aims to derive appropriate inference statements about a set of parameters by revising and updating their occurrence probabilities in light of new evidence [@Everitt_et_al_2010]. The procedure consist on defining the model assumptions in the form of a *likelihood* for the outcome and a set of *prior distributions* for the parameters of interest. Upon observing empirical data, these priors undergo updating to *posterior distributions* following Bayes' rule [@Jeffreys_1998], from which the statistical inferences are derived [^1]. As an example, a simple linear regression model with a parameter $\beta$ can be encoded under the Bayesian inference paradigm in the following form:[^1]: For a thorough explanation of the Bayesian inferences procedures the reader can refer to Kruschke [-@Kruschke_2015] or McElreath [-@McElreath_2020].::: {.column-margin}**_Bayesian inference_** Approach to statistical modeling and inference, that aims to derive appropriate inference statements about one or a set of parameters by revising and updating their probabilities in light of new evidence [@Everitt_et_al_2010].:::$$ \begin{align*}P(\beta | Y, X ) &= \frac{ P( Y | \beta, X ) \cdot P( \beta ) }{ P( Y ) }\end{align*}$$ {#eq-bayes1}where $P( Y| \beta, X )$ defines the *likelihood* of the outcome, which represents the assumed probability distribution for the outcome $Y$, given the parameter $\beta$ and covariate $X$, i.e., is the distribution that describes the assumption about the underlying process that give rise to the data [@Everitt_et_al_2010].::: {.column-margin}**_Likelihood_** probability distribution that describes the assumption about the underlying process that give rise to the data [@Everitt_et_al_2010].:::$P( \beta )$ defines the *prior distribution* of the parameter $\beta$. A *prior* is a probability distribution summarizing the information about a parameter known or assumed before observing any empirical data [@Everitt_et_al_2010].::: {.column-margin}**_Prior distribution_** Probability distribution summarizing the information about a parameter known or assumed before observing any empirical data [@Everitt_et_al_2010].:::$P( Y )$ defines the probability distribution of the data, which represents the *evidence* of the observed empirical data.As a result $P( \beta | Y, X )$, which denotes the *posterior distribution* of the parameter, describes the probability distribution of $\beta$ after observing empirical data.::: {.column-margin}**_Posterior distribution_** Probability distribution summarizing the information about a parameter after observing empirical data [@Everitt_et_al_2010].:::Before implementing the Bayesian inference procedures, two important concepts related to @eq-bayes1 need to be understood. First, the evidence of the empirical data $P(Y)$ serves as a normalizing constant. This is just another way of saying that the numerator in the equation is rescaled by a constant obtained from calculating $P(Y)$. Consequently, without loosing generalization, the equation can be succinctly rewritten in the following form:$$ \begin{align*}P(\beta | Y, X ) &\propto P( Y | \beta, X ) \cdot P( \beta ) \\\end{align*}$$ {#eq-bayes2}where $\propto$ denotes the proportional symbol. This implies that the posterior distribution of $\beta$ is proportional (up to a constant) to the multiplication of the outcome's likelihood and the parameter's prior distribution. This definition make the *calculation* of the posterior distribution easier, by separating the parameter's *updating process* from the integration of new empirical data (this will be clearly seen in the code provided in @sec-howitworks).Second, a dataset usually have multiple observations of the outcome $Y$ and covariate $X$, in the form of $y_{i}$ and $x_{i}$. Therefore, by law of probabilities and assuming independence among the observations, the likelihood of the full dataset can be rewritten as the product of all individual observation likelihoods. Consequently, @eq-bayes2 can also be rewritten as follows:$$ \begin{align*}P(\beta | Y, X ) &\propto \prod_{i=1}^{n} P( y_{i} | \beta, x_{i} ) \cdot P( \beta ) \end{align*}$$ {#eq-bayes3}### Estimation methods {#sec-estimation_methods}Several methods within the Bayesian inference procedures can be utilized to *estimate* the posterior distribution of the parameter, and most of these fall into the category of *Markov Chain Monte Carlo methods (MCMC)*. *MCMC* are methods to indirectly simulate random observations from probability distributions using stochastic processes [@Everitt_et_al_2010] [^2]. [^2]: The reader can refer to Brooks et al. [-@Gelman_et_al_2011] for a detailed treatment on MCMC methods.::: {.column-margin}**_Markov Chain Monte Carlo (MCMC)_** Methods to indirectly simulate random observations from probability distributions using stochastic processes [@Everitt_et_al_2010].:::However, when the parameters of interest are not large in number, a useful pedagogical method to produce the posterior distribution is the *grid approximation* method. Through this method, an excellent approximation of the parameter's posterior distribution can be achieved by considering a finite candidate list of parameter values. This method is used in @sec-howitworks to illustrate how the Bayesian inference works [^3]. [^3]: An interested reader can further refer to McElreath [-@McElreath_2020] for a detailed explanation of grid approximation. ::: {.column-margin}**_Grid approximation_** Method to indirectly simulate random observations from low dimensional continuous probability distributions, by considering a finite candidate list of parameter values [@McElreath_2020].:::### How does it work? {#sec-howitworks}A simple Bayesian linear regression model can be written in the following form:$$ \begin{align*}y_{i} &= \beta \cdot x_{i} + e_{i} \\e_{i} &\sim \text{Normal}( 0, 1 ) \\\beta &\sim \text{Uniform}( -20, +20 )\end{align*}$$where $y_{i}$ denotes the outcome's observation $i$, $\beta$ the expected effect of the observed covariate $x_{i}$ on the outcome, and $e_{i}$ the outcome's residual in observation $i$. Furthermore, the model assumes the residual $e_{i}$ is also normally distributed with mean zero and standard deviation equal to one. Lastly, prior to observe any data, it is assumed that $\beta$ is uniformly distributed within the range of $[-20,+20]$.However, a more convenient generalized manner to represent the same linear regression model is as follows:$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &\sim \text{Uniform}( -20, +20 )\end{align*}$$In this definition, the component of the Bayesian inference procedure detailed in @sec-bayesian_theory are more easily spotted. First, about the likelihood, the outcome is assumed to be normally distributed with mean $\mu_{i}$ and standard deviation equal to one. Second, it is assumed $\beta$ has a prior that is a normal distribution with mean zero and standard deviation equal to one. Additionally, the equations reveal that the mean of the outcome $\mu_{i}$ is modeled by a linear predictor composed by the covariate $x_{i}$ and its effect on the outcome $\beta$.For illustration purposes, a simulated regression with $n=100$ observations was generated assuming $\beta=0.2$. @fig-regression_simulation shows the scatter plot of the generated data (see code below). The grid approximation method is used to generate random observations from the posterior distribution of $\beta$. Two noteworthy results emerge from the approach. Firstly, once the posterior distribution is generated, various summaries can be used to make inferences about the parameter of interest (refer to the code output below). Secondly, when considering a dataset with $n=100$ observations, the influence of the prior on the posterior distribution of $\beta$ is negligible. Specifically, prior to observe any data, assuming that $\beta$ could take any value within the range of $[-20,+20]$ with equal probability (left panel of @fig-bayesian_inference) did not have a substantial impact on the distribution of $\beta$ after empirical data was observed (right panel of @fig-bayesian_inference).```{r}#| label: code-regression_simulation#| fig-cap: ''set.seed(12345) # <1>n =100# <2>b =0.2# <3>x =rnorm( n=n, mean=0, sd=1 ) # <4>mu_y = b*x # <5>y =rnorm( n=n, mean=mu_y, sd=1 ) # <6>```1. replication seed2. simulation sample size3. covariate effect4. covariate simulation5. linear predictor on outcome mean6. outcome simulation```{r}#| label: code-bayesian_inference#| fig-cap: ''# grid approximationNgp =1000# <1>b_cand =seq( from=-20, to=20, length.out=Ngp ) # <2>udf =function(i){ b_cand[i]*x } # <3>mu_y =sapply( 1:length(b_cand), udf ) # <4>udf =function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) } # <5>y_lik =sapply( 1:length(b_cand), udf ) # <6>b_prior =rep( 1/40, length(b_cand) ) # <7>b_prop = y_lik * b_prior # <8>b_post = b_prop /sum(b_prop) # <9>```1. number of points in candidate list 2. candidate list for parameter3. user defined function: linear predictor for each candidate4. calculation of the linear predictor for each candidate5. user defined function: product of individual observation likelihoods6. outcome data likelihood7. uniform prior distribution for parameter (min=-20, max=20)8. proportional posterior distribution for parameter9. posterior distribution for parameter```{r}#| label: code-bayesian_summary#| fig-cap: ''paste0( 'true beta = ', b ) # <1>b_exp =sum( b_cand * b_post ) # <2>paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )b_max = b_cand[ b_post==max(b_post) ] # <3>paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )b_var =sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) ) # <4>paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )b_prob =sum( b_post[ b_cand >0 ] ) # <5>paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )```1. true values for the parameter2. expected value for the parameter3. maximum probability value for the parameter4. standard deviation for the parameter5. probability that the parameter is greater than zero```{r}#| label: fig-regression_simulation#| fig-cap: 'Outcome simulation'#| fig-height: 4#| fig-width: 5plot( x, y, xlim=c(-3,3), ylim=c(-3,3), # <1>pch=19, col=rgb(0,0,0,alpha=0.3) )abline( a=0, b=b, lty=2, col='blue' )abline( a=0, b=b_exp, lty=2, col='red' )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )```1. simulation plot```{r}#| label: fig-bayesian_inference#| fig-cap: 'Bayesian inference: grid approximation'#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5), # <1>main='Prior distribution',xlab=expression(beta), ylab='probability' ) abline( v=0, lty=2, col='gray' )plot( b_cand, b_post, type='l', xlim=c(-1,1), # <2>main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b, b_exp), lty=2, col=c('gray','blue','red') )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plot### Priors and their effects {#sec-prior_effects}Prior to observing empirical data, assuming the parameter could take any value within within the range of $[-20,+20]$ with equal probability is not the only prior assumption that can be made. Different levels of uncertainty associated with a parameter can be encoded by different priors. This concept illustrated with @fig-prior_effects1 through @fig-prior_effects3, where three different types of priors are used to encode three levels of uncertainty about the parameter $\beta$.```{r}#| label: code-prior_effects#| fig-cap: ''# grid approximationNgp =1000# <1>post =data.frame( b_cand=seq( from=-20, to=20, length.out=Ngp ) ) # <2>ud_func =function(i){ post$b_cand[i]*x } # <3>mu_y =sapply( 1:length(post$b_cand), ud_func ) # <4>ud_func =function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) } # <5>y_lik =sapply( 1:length(post$b_cand), ud_func ) # <6>post$b_prior1 =rep( 1/40, length(post$b_cand) ) # <7>post$b_prior2 =dnorm( post$b_cand, mean=0, sd=0.5 ) # <8>post$b_prior3 =dnorm( post$b_cand, mean=0.2, sd=0.05 ) # <9>nam =c()for( i in1:3 ){ # <10> b_prop = y_lik * post[, paste0('b_prior',i) ] nam =c(nam, paste0('b_post',i) ) post =cbind(post, data.frame( b_prop /sum(b_prop) ) ) }names(post)[5:7] = nam```1. number of points in candidate list 2. candidate list for parameter3. user defined function: linear predictor for each candidate4. calculation of the linear predictor for each candidate5. user defined function: product of individual observation likelihoods6. outcome data likelihood7. prior 1: uniform prior distribution (min=-20, max=+20)8. prior 2: normal prior distribution (mean=0, sd=0.5)9. prior 3: normal prior distribution (mean=0.2, sd=0.05)10. posterior distribution for each priorFirst, the distribution depicted in @fig-prior_effects1 assumes $\beta \sim \text{Uniform}(-20, +20)$ (similar to what is observed in @sec-howitworks). The distribution does not restrain the effect of $\beta$ to be more probable in any range within $[-20, +20]$. This type of distribution is commonly referred to as a *non-informative prior*. A *non-informative prior* reflects reflects the distributional commitment of a parameter to a wide range of values within a specific parameter space [@Everitt_et_al_2010].::: {.column-margin}**_Non-informative priors_** Prior that reflects the distributional commitment of a parameter to a wide range of values within a specific parameter space [@Everitt_et_al_2010].:::```{r}#| label: fig-prior_effects1#| fig-cap: 'Bayesian inference: posterior distributions with non-informative prior distribution.'#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( post[, c('b_cand','b_prior1')], type='l', # <1>xlim=c(-1.5,1.5), main='Prior distribution',xlab=expression(beta), ylab='probability' )abline( v=0, lty=2, col='gray' )plot( post[, c( 'b_cand','b_post1')], type='l', # <2>xlim=c(-1,1), main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b), lty=2, col=c('gray','blue') )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plotSecond, the distribution described in @fig-prior_effects2 assumes $\beta \sim \text{Normal}(0, 0.5)$. Consequently, the effect of $\beta$ is more probable within the range $[-1,+1]$, with less probability associated with parameter values outside this range. This is a an example of a *weakly-informative prior distribution*. *Weakly informative priors* reflect the distributional commitment of a parameter to a weakly constraint range of values within a realistic parameter space [@McElreath_2020]. ::: {.column-margin}**_Weakly informative priors_** Prior that reflects the distributional commitment of a parameter to a weakly constraint range of values within a realistic parameter space [@McElreath_2020].:::```{r}#| label: fig-prior_effects2#| fig-cap: 'Bayesian inference: posterior distributions with weakly-informative prior distribution.'#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( post[, c('b_cand','b_prior2')], type='l', # <1>xlim=c(-1.5,1.5), main='Prior distribution',xlab=expression(beta), ylab='probability' )abline( v=0, lty=2, col='gray' )plot( post[, c( 'b_cand','b_post2')], type='l', # <2>xlim=c(-1,1), main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b), lty=2, col=c('gray','blue') )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plotThird, the distribution described in @fig-prior_effects3 assumes $\beta \sim \text{Normal}(0.2, 0.05)$. As a result, the effect of $\beta$ is more probable within the range $[0.1,0.3]$, with less probability associated with parameter values outside this range. This is an example of an *informative prior distribution*. *Informative priors* are distributions that expresses specific and definite information about a parameter [@McElreath_2020]. ::: {.column-margin}**_Informative priors_** Prior distributions that that expresses specific and definite information about a parameter [@McElreath_2020].:::```{r}#| label: fig-prior_effects3#| fig-cap: 'Bayesian inference: posterior distributions with informative prior distributions.'#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( post[, c('b_cand','b_prior3')], type='l', # <1>xlim=c(-1.5,1.5), main='Prior distribution',xlab=expression(beta), ylab='probability' )abline( v=0, lty=2, col='gray' )plot( post[, c( 'b_cand','b_post3')], type='l', # <2>xlim=c(-1,1), main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b), lty=2, col=c('gray','blue') )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plotLastly, regarding the influence of different priors on the posterior distributions, @fig-prior_effects1 and @fig-prior_effects2 reveals that non-informative and weakly-informative priors have a negligible influence on the posterior distribution. Both priors result in similar posteriors. Furthermore, the figure shows the data sample size $n=100$ is still not enough to provide an unbiased and precise estimation of the true effect. In contrast, @fig-prior_effects3 shows that, informative priors can have a meaningful influence in the posterior distribution. In this particular case, the prior helps to estimate an unbiased and more precise effect. This results shows that when the data sample size is not sufficiently large, the prior assumptions can play a significant role on obtaining appropriate parameter estimates.### What are Hyperpriors? {#sec-hyperpriors}In cases requiring greater modeling flexibility, a more refined representation of the parameters’ priors can be defined in terms of hyperparameters and hyperpriors. *Hyperparameters* refer to parameters indexing a family of possible prior distributions for the original parameter, while *hyperpriors* are prior distributions for such hyperparameters [@Everitt_et_al_2010].::: {.column-margin}**_Hyperparameters_** Parameters $\theta_{2}$ that indexes a family of possible prior distributions for another parameter $\theta_{1}$ [@Everitt_et_al_2010].:::::: {.column-margin}**_Hyperpriors_** Prior distributions for hyperparameters [@Everitt_et_al_2010].:::A simple example of the use of hyperpriors would be to define the regression model shown in @sec-howitworks in the following form:$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &\sim \text{Normal}( 0, \text{exp}(v) ) \\v &\sim \text{Normal}(0, 3)\end{align*}$$where $v$ define the *hyperparameter* for the parameter $\beta$, and its associated distribution define its *hyperprior*. However, setting prior distributions through hyperparameters brings its own challenges. One notable challenge pertains to the geometry of the parameter's sample space. This implies that prior probabilistic representations, defined in terms of hyperparameters, sometimes exhibit simpler sample geometries compared simple priors [^7]. The re-parametrization of priors into such simpler sample geometries leads to the notion of *non-centered priors*. In this approach, a parameter's prior distribution is expressed in terms of a hyperparameter, which is defined by a transformation of the original parameter of interest [@Gorinova_et_al_2019]. By incorporating *non-centered priors*, researchers can ensure the reliability of certain posterior distributions within Bayesian inference procedures. To illustrate, a straightforward example of a non-centered reparametrization of a prior can be demonstrated as follows:::: {.column-margin}**_Non-centered priors_** Expression of a parameter's distribution in terms of an hyperparameter defined by a transformation of the original parameter of interest [@Gorinova_et_al_2019].:::[^7]: An interested reader can refer to McElreath [-@McElreath_2020], Gorinova et al. [-@Gorinova_et_al_2019] and Neal [-@Neal_2003]$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &= z \cdot \text{exp}(v) \\v &\sim \text{Normal}(0, 3) \\z &\sim \text{Normal}( 0, 1 )\end{align*}$$where $z$ is a hyperparameter sampled independently from $v$, and the parameter of interest $\beta$ is obtained as a transformation of the two hyperparameters. @fig-reparametrization illustrates the differences in sampling geometries between a centered and a non-centered parametrization. It is evident that the sampling geometry depicted in the left panel of the figure is narrower than the one depicted in the right panel, and as a result, Bayesian inference procedures have an harder time sampling from the former than the latter distributions.```{r}#| label: code-reparametrization#| fig-cap: ''n =5000# <1>v =rnorm( n=n, mean=0, sd=1 ) # <2>z =rnorm( n=n, mean=0, sd=1 )b_cent =rnorm( n=n, mean=0, sd=exp(v) ) # <3>b_non = z*exp(v) # <4>```1. simulation sample size2. hyperparameter simulation3. centered parametrization simulation4. non-centered parameterization simulation```{r}#| label: fig-reparametrization#| fig-cap: 'Centered and non-centered parameter spaces'#| fig-height: 4#| fig-width: 10par( mfrow=c(1,2) )plot( b_cent, v, pch=19, col=rgb(0,0,0,alpha=0.1), # <1>xlab=expression(beta), ylab=expression(v),main='Centered parametrization' ) plot( z, v, pch=19, col=rgb(0,0,0,alpha=0.1), # <2>xlab=expression(z), ylab=expression(v),main='Non-centered parametrization' )par( mfrow=c(1,1) )```1. plot of centered parametrization2. plot of non-centered parametrization### Importance {#sec-whybayesian}The selection of the Bayesian approach was based on three key properties. Firstly, empirical evidence from prior research demonstrates that Bayesian methods outperform frequentist methods, particularly in handling complex and over-parameterized models [@Baker_1998; @Kim_1999]. This superiority is evident when dealing with complex models, like the proposed GLLAMM, that are challenging to program or are not viable under frequentist methods [@Depaoli_2014]. Secondly, the approach allows for the incorporation of prior information, ensuring that certain parameters are confined within specified boundaries. This helps mitigate non-convergence or improper parameter estimation issues commonly observed in complex models under frequentist methods [@Martin_et_al_1975; @Seaman_et_al_2011]. In this study, for example, this property was leveraged to incorporate information about the variances of random effects and constrain them to be positive.Lastly, the Bayesian approach demonstrates proficiency in handling relatively small sample sizes [@Baldwin_et_al_2013; @Lambert_et_al_2005; @Depaoli_2014]. In this case, despite the study dealing with $2,263$ entropy scores, these were derived from a modest sample size of $32$ speakers, from whom the inferences are drawn. Consequently, reliance on the asymptotic properties of frequentist methods may not be warranted in this context, underscoring the pertinence of this property to the current study.::: {.column-margin}**_Benefits of Bayesian inference procedures_** More suitable to deal with: 1. Complex or highly-parameterized model2. Parameter's constraints.3. Small sample sizes:::## A tale of two distributions {#sec-interlude2}### The normal distribution {#sec-normal_dist}A normal distribution is a type of continuous probability distribution in which a random variable can take on values along the real line $\left( y_{i} \in [-\infty, \infty] \right)$. The distribution is characterized by two independent parameters: the mean $\mu$ and the standard deviation $\sigma$ [@Everitt_et_al_2010]. Thus, a random variable can take on values that are gathered around a mean $\mu$, with some values dispersed based on some amount of deviation $\sigma$, without any restriction. Importantly, by definition of the normal distribution, the *location* (mean) of the distribution does not influence its *spread* (deviation). @fig-normal_dist illustrates how the distribution of an outcome changes with different values of $\mu$ and $\sigma$. The left panel demonstrate that the distribution of the outcome can shift in terms of its location based on the value of $\mu$. The right panel shows how the distribution of the outcome can become narrower or wider based on the values of $\sigma$. It is noteworthy that alterations in the mean $\mu$ of the distribution have no impact on its standard deviation $\sigma$.```{r}#| label: fig-normal_dist#| fig-cap: 'Normal distribution with different mean and standard deviations'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>mu =c(-1.5, 0, 1.5) # <2>sigma =c(1.5, 1, 0.5) par(mfrow=c(1,2))cp =sapply( 1:length(mu), col.alpha, alpha=0.7) for(i in1:length(mu)){if(i==1){curve( dnorm(x, mean=mu[i], sd=1), # <3>from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], xlab="outcome values", ylab="density")abline(v=mu, col='gray', lty=2)legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',legend=expression( mu[1]==-1.5, mu[2]==0, mu[3]==+1.5, sigma==1) ) } else{curve( dnorm(x, mean=mu[i], sd=1),from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], xlab="", ylab="", add=T ) }}cp =sapply( 1:length(sigma), col.alpha, alpha=0.7)for(i in1:length(sigma)){if(i==1){curve( dnorm(x, mean=0, sd=sigma[i]), # <4> from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],xlab="outcome values", ylab="density")abline(v=0, col='gray', lty=2)legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',legend=expression( sigma[1]==1.5, sigma[2]==1, sigma[3]==0.5, mu==0) ) } else{curve( dnorm(x, mean=0, sd=sigma[i]), from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], xlab="", ylab="", add=T ) }}par(mfrow=c(1,1))```1. required package2. parameter to plot: means and standard deviations3. plotting normal distribution with different 'mu' and 'sigma=1'4. plotting normal distribution with 'mu=0' and different sigma's### The beta-proportion distribution {#sec-beta_dist}A beta-proportion distribution is a type of continuous probability distribution in which a random variable can assume values within the continuous interval between zero and one $\left( y_{i} \in [0, 1] \right)$. The distribution is characterized by two parameters: the mean $\mu$ and the *sample size* $M$ [@Everitt_et_al_2010]. This implies that a random variable can take on values restricted within the unit interval, centered around a mean $\mu$, with some values being more dispersed based on the *sample size* $M$. Additionally, two characteristic define the distribution. Firstly, like the random variable, the mean of the distribution can only take values within the unit interval ($\mu \in [0,1]$). Secondly, the mean and sample size parameters are no longer independent of each other.@fig-betaprop_dist illustrates how an outcome with a beta-proportion distribution changes with different values of $\mu$ and $M$. The figure reveals two prevalent patterns in the distribution: (1) the behavior of the dispersion, as measured by the sample size, depends on the mean of the distribution, and (2) the larger the sample size, the less dispersed the distribution is within the unit interval.```{r}#| label: fig-betaprop_dist#| fig-cap: 'Beta-proportion distribution with different mean and sample sizes'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>mu =c(0.2, 0.5, 0.8) # <2>M =c(2, 5, 20) par(mfrow=c(1,2))cp =sapply( 1:length(mu), col.alpha, alpha=0.7) for(i in1:length(mu)){if(i==1){curve( dbeta2(x, prob=mu[i], theta=10), # <3>from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], xlab="outcome values", ylab="density")abline(v=mu, col='gray', lty=2)legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',legend=expression( mu[1]==0.2, mu[2]==0.5, mu[3]==0.8, M==10) ) } else{curve( dbeta2(x, prob=mu[i], theta=10),from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], xlab="", ylab="", add=T ) }}cp =sapply( 1:length(M), col.alpha, alpha=0.7)for(i in1:length(M)){if(i==1){curve( dbeta2(x, prob=0.3, theta=M[i]), # <4>from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], xlab="outcome values", ylab="density")abline(v=0.3, col='gray', lty=2)legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',legend=expression( M[1]==2, M[2]==5, M[3]==20, mu==0.3) ) } else{curve( dbeta2(x, prob=0.3, theta=M[i]),from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], xlab="", ylab="", add=T ) }}par(mfrow=c(1,1))```1. required package2. parameter to plot: means and 'sample size'3. plotting beta-proportion distribution with different 'mu' and 'M=10'4. plotting beta-proportion distribution with 'mu=0.5' and different M's### Importance {#sec-whybeta}It is crucial to comprehend what signifies for an outcome to follow a normal distribution, as the assumption of normally distributed outcomes is ubiquitous in speech intelligibility research [see @Boonen_et_al_2021; @Flipsen_2006; @Lagerberg_et_al_2014]. In contrast, the significance of the beta-proportion distribution lies in providing a suitable alternative for modeling non-normally *bounded* distributed outcomes, such as the entropy scores utilized in this study. *Boundedness* refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur [@Lebl_2022]. Neglecting the bounded nature of an outcome can lead, at best, to *underfitting*, and, at worse, to *misspecification*. *Underfitting* occurs when statistical models fail to capture the underlying data patterns, potentially causing the generation of predictions outside the data range, hindering the model's inability to generalize its results when confronted with new data. Conversely, *misspecification*, marked by a poor representation of relevant aspects of the true data in the model's functional form or covariates inclusion, can lead to inconsistent and inefficient parameters estimates [@Everitt_et_al_2010].::: {.column-margin}**_Boundedness_**Refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur [@Lebl_2022]:::::: {.column-margin}**_Underfitting_**Occurs when statistical models fail to capture the underlying data patterns, potentially causing the generation of predictions outside the data range, hindering the model's inability to generalize its results when confronted with new data [@Everitt_et_al_2010].:::::: {.column-margin}**_Misspecification_**Occurs when the model's functional form or inclusion of covariates poorly represent relevant aspects of the true data. This can lead to inconsistent and inefficient parameters estimates [@Everitt_et_al_2010].:::## Linear Mixed Models {#sec-interlude3}### The ordinary LMM {#sec-LMM}An *ordinary linear mixed model (LMM)* is a procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates [@Holmes_et_al_2019]. A commonly know Bayesian probabilistic representation of an ordinary LMM can be expressed as follows:::: {.column-margin}**_Ordinary linear mixed model (LMM)_**Procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates [@Holmes_et_al_2019].:::$$\begin{align*}y_{ib} &= \beta x_{i} + a_{b} + \varepsilon_{ib} \\\varepsilon_{ib} &\sim \text{Normal}(0, 1) \\\beta &\sim \text{Normal}(0, 0.5) \\a_{b} &\sim \text{Normal}(0, 1) \end{align*}$$where $y_{ib}$ denotes the outcome's $i$'th observation clustered in block $b$, and $x_{i}$ denotes the covariate for observation $i$. Moreover, $\beta$ denote the fixed slope of the regression. Furthermore, $a_{b}$ denotes the random effects, and $\varepsilon_{ib}$ defines the random outcome residuals. Furthermore, the residuals $\varepsilon_{ib}$ are assumed to be normally distributed with mean zero and standard deviation equal to one. Additionally, prior to observing any data, $\beta$ is assumed to be normally distributed with mean zero and standard deviation equal to $0.5$. Similarly, $a_{b}$ is assumed to be normally distributed with mean zero and standard deviation equal to one.### The generalized LMM {#sec-GLMM}A *generalized linear mixed model (GLMM)* are a set of models used to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates [@Nelder_et_al_1996]. Interestingly, the ordinary Bayesian LMM detailed in the previous section can be represented as a special case of GLMM, as follows:::: {.column-margin}**_Generalized linear mixed model (GLMM)_**Procedure employed to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates [@Nelder_et_al_1996].:::$$\begin{align*}y_{ib} &\sim \text{Normal}( \mu_{ib}, 1) \\\mu_{ib} &= \beta x_{i} + a_{b} \\\beta &\sim \text{Normal}(0, 0.5) \\a_{b} &\sim \text{Normal}(0, 1) \\\end{align*}$$Notice this representation explicitly highlights the three components of a GLMM: the likelihood component, the linear predictor, and the link function [@McElreath_2020]. The likelihood component specifies the assumption about the distribution of an outcome, in this case a normal distribution with mean $\mu_{ib}$ and standard deviation equal to one. The linear predictor specifies the manner in which the covariate will predict the mean of the outcome. In this case the linear predictor is a linear combination of the parameter $\beta$, the covariate $x_{i}$, and the random effects $a_{b}$. The link function specifies the relationship between the mean of the outcome $\mu_{ib}$ and the linear predictor. In this case no transformation is applied to the linear predictor to match its range with the range of the outcome, as both can take on values within the real line (refer to @sec-normal_dist). Lastly, resulting from the use of Bayesian procedures, a fourth component can be added to any GLMM: the prior distributions. The priors describe what is known about the parameters $\beta$ and $a_{b}$ before observing any empirical data.::: {.column-margin}**_GLMM components_**1. Likelihood component2. Linear predictor3. Link function:::On the other hand, a Beta-proportion LMM is also a GLMM, and it can be represented probabilistically as follows:$$\begin{align*}y_{ib} &\sim \text{BetaProp}( \mu_{ib}, 10 ) \\\mu_{ib} &= \text{logit}^{-1}( \beta x_{i} + a_{b} ) \\\beta &\sim \text{Normal}(0, 0.5) \\a_{b} &\sim \text{Normal}(0, 1) \\\end{align*}$$Notice the representation also highlights the three components of a GLMM; however, their assumptions are now slightly different. The likelihood component assumes a beta-proportion distribution for the outcome with mean $\mu_{ib}$ and sample size equal to $10$. The linear predictor is still a linear combination of the parameter $\beta$, the covariate $x_{i}$, and the random intercepts $a_{b}$. However, the link function now assumes the mean of the outcome is (non)linearly related to the linear predictor by a inverse-logit function: $\text{logit}^{-1}(x) = exp(x) / (1+exp(x))$. The inverse-logit function allows the linear predictor to match the range observed in the mean of the beta-proportion distribution $\mu_{ib} \in [0,1]$ (refer to @sec-beta_dist). Lastly, the additional fourth component resulting from using Bayesian procedures, the prior assumptions for $\beta$ and $a_{b}$ are also declared.### Importance {#sec-whyGLMM}Understanding LMM is essential due to the ubiquitous assumption of normally distributed outcomes within the speech intelligibility research field [see @Boonen_et_al_2021; @Flipsen_2006; @Lagerberg_et_al_2014]. Furthermore, their significance also lies in their ability to model *clustered* outcomes. *Clustering* occurs when multiple observations arise from the same individual, location, or time [@McElreath_2020]. Accounting for data clustering is essential, as disregarding it may result in biased and inefficient parameter estimates. Consequently, such biases and inefficiencies can diminish *statistical power* or increase the likelihood of committing a *type I error*. *Statistical power* defines the model's ability to reject the null hypothesis when it is false [@Everitt_et_al_2010]. *Type I error* occurs when a null hypothesis is erroneously rejected [@Everitt_et_al_2010].::: {.column-margin}**_Clustering_**Occurs when multiple observations arise from the same individual, location, or time [@McElreath_2020].:::::: {.column-margin}**_Statistical power_** The model's ability to reject the null hypothesis when it is false [@Everitt_et_al_2010].:::::: {.column-margin}**_Type I error_**The error that results when a null hypothesis is erroneously rejected [@Everitt_et_al_2010].:::Moreover, the significance of GLMM lies in offering the same benefits as the LMMs, in terms of parameter unbiasedness and efficiency. However, the framework also allows for the modeling of (non)linear relationships of (non)normally distributed outcomes. This is particularly important for modeling bounded data, such as the entropy scores utilized in this study. Refer to @sec-whybeta to understand the importance of considering the bounded nature of the data in the modeling process.## Measurement error in an outcome {#sec-interlude4}### What is the problem?Measurement error refers to the disparity between the observed values of a variable, recorded under similar conditions, and some fixed *true* value which is not directly observable [@Everitt_et_al_2010]. The problem of measurement error in an outcome is easier to understand with a motivating example. Using a similar model as the one depicted in @sec-howitworks, the probabilistic representation of measurement error in the outcome can be depicted as follows:::: {.column-margin}**_Latent variables_** It refers to the disparity between the observed values of a variable, recorded under similar conditions, and some fixed *true* value which is not directly observable [@Everitt_et_al_2010].:::$$ \begin{align*}\tilde{y}_{i} &\sim \text{Normal}( y_{i}, s ) \\y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &\sim \text{Uniform}( -20, 20 )\end{align*}$$This representation effectively means that a *manifest* outcome $\tilde{y}_{i}$ is assumed to be normally distributed with a mean equal to the *latent* outcome $y_{i}$ and a measurement error $s$. The latent outcome $y_{i}$ is also assumed to be normally distributed but with a mean $\mu_{i}$ and a standard deviation of one. The mean of the latent outcome is considered to be explained by a linear combination of the covariate $x_{i}$ and its expected effect $\beta$. Lastly, prior to observing any data, $\beta$ is assumed to follow a uniform distribution within the range of $[-20, +20]$, representing a non-informative prior.For illustrative purposes, a simulated outcome with $n=100$ observations was generated, assuming $\beta=0.2$, and a measurement error of $s=2$. @fig-measurement_simulation shows the scatter plot of the generated data (see code below). The left panel of the figure demonstrates that the *manifest* outcome has a larger spread than the *latent* outcome depicted in the right panel. As a result, although $\beta$ is expected to be estimated in an unbiased manner, the statistical hypothesis tests for the parameter will likely be affected due to this larger variability.The estimation output confirms the previous hypothesis. The posterior distribution of $\beta$, estimated using the manifest outcome, has a larger standard deviation than the one estimated using the appropriate latent outcome (see @fig-measurement_inference and code output below). Furthermore, the code output shows the parameter's posterior distribution can no longer reject the null hypothesis at confidence levels of $90\%$ and $95\%$, indicating a reduced statistical power.```{r}#| label: code-measurement_simulation#| fig-cap: ''set.seed(12345) # <1>n =100# <2>b =0.2# <3>x =rnorm( n=n, mean=0, sd=1 ) # <4>mu_y = b*x # <5>y =rnorm( n=n, mean=mu_y, sd=1 ) # <6>s =2# <7>y_tilde =rnorm( n=n, mean=y, sd=s ) # <8>```1. replication seed2. simulation sample size3. covariate effect4. covariate simulation5. linear predictor on outcome mean6. latent outcome simulation7. measurement error8. manifest outcome simulation```{r}#| label: code-measurement_inference#| fig-cap: ''# grid approximationNgp =1000# <1>b_cand =seq( from=-20, to=20, length.out=Ngp ) # <2>udf =function(i){ b_cand[i]*x } # <3>mu_y =sapply( 1:length(b_cand), udf ) # <4>udf =function(i){ prod( dnorm( y_tilde, mean=mu_y[,i], sd=s ) ) } # <5>y_lik_man =sapply( 1:length(b_cand), udf ) # <6>udf =function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) } # <7>y_lik_lat =sapply( 1:length(b_cand), udf ) # <8>b_prior =rep( 1/40, length(b_cand) ) # <9>b_prop_man = y_lik_man * b_prior # <10>b_post_man = b_prop_man /sum(b_prop_man) # <11>b_prop_lat = y_lik_lat * b_prior # <12>b_post_lat = b_prop_lat /sum(b_prop_lat) # <13>```1. number of points in candidate list 2. candidate list for parameter3. user defined function: linear predictor for each candidate4. calculation of the linear predictor for each candidate5. user defined function: product of individual observation likelihoods for manifest outcome6. manifest outcome data likelihood7. user defined function: product of individual observation likelihoods for latent outcome8. latent outcome data likelihood9. uniform prior distribution for parameter, on manifest and latent outcomes10. proportional posterior distribution for parameter on manifest outcome11. posterior distribution for parameter on manifest outcome12. proportional posterior distribution for parameter on latent outcome13. posterior distribution for parameter on latent outcome```{r}#| label: code-measurement_summary#| fig-cap: ''paste0( 'true beta = ', b ) # <1># manifest outcomeb_exp_man =sum( b_cand * b_post_man ) # <2>paste0( 'estimated beta (expectation on manifest) = ', round(b_exp_man, 3) )b_var_man =sqrt( sum( ( (b_cand-b_exp_man)^2 ) * b_post_man ) ) # <3>paste0( 'estimated beta (standard deviation on manifest) = ', round(b_var_man, 3) )# latent outcomeb_exp_lat =sum( b_cand * b_post_lat ) # <4>paste0( 'estimated beta (expectation on latent) = ', round(b_exp_lat, 3) )b_var_lat =sqrt( sum( ( (b_cand-b_exp_lat)^2 ) * b_post_lat ) ) # <5>paste0( 'estimated beta (standard deviation on latent) = ', round(b_var_lat, 3) )# null hypothsis rejectionb_prob_man =sum( b_post_man[ b_cand >0 ] ) # <6>paste0( 'P(estimated beta on manifest > 0) = ', round(b_prob_man, 3) )b_prob_lat =sum( b_post_lat[ b_cand >0 ] ) # <7>paste0( 'P(estimated beta on latent > 0) = ', round(b_prob_lat, 3) )```1. true values for the parameter2. expected value for the parameter on manifest outcome3. standard deviation for the parameter on manifest outcome4. expected value for the parameter on latent outcome5. standard deviation for the parameter on latent outcome6. probability that the parameter is greater than zero, on manifest outcome7. probability that the parameter is greater than zero, on latent outcome```{r}#| label: fig-measurement_simulation#| fig-cap: 'Measurement error simulation'#| fig-height: 4#| fig-width: 10par( mfrow=c(1,2) )plot( x, y_tilde, xlim=c(-3,3), ylim=c(-7,7), # <1>pch=19, col=rgb(0,0,0,alpha=0.3),ylab=expression(tilde(y)),main='manifest outcome' )abline( a=0, b=b, lty=2, col='blue')abline( a=0, b=b_exp_man, lty=2, col='red' )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )plot( x, y, xlim=c(-3,3), ylim=c(-7,7), # <2>pch=19, col=rgb(0,0,0,alpha=0.3),main='latent outcome' )abline( a=0, b=b, lty=2, col='blue')abline( a=0, b=b_exp_lat, lty=2, col='red' )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )par( mfrow=c(1,1) )```1. simulation plot of manifest outcome2. simulation plot of latent outcome```{r}#| label: fig-measurement_inference#| fig-cap: 'Bayesian inference: grid approximation on measurement error outcomes'#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( b_cand, b_post_man, type='l', xlim=c(-0.5,1), # <1>main='Posterior on manifest outcome',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b, b_exp_man), lty=2, col=c('gray', 'blue', 'red') )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )plot( b_cand, b_post_lat, type='l', xlim=c(-0.5,1), # <2>main='Posterior on latent outcome',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b, b_exp_lat), lty=2, col=c('gray', 'blue','red') )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plot### How to solve it?*Latent variables* can be used to address the problem arising from the larger observed variability in one or more manifest outcomes. A *latent variable* is a variable that cannot be directly measured but is assumed to be primarily responsible for the variability in one or more manifest variables [@Everitt_et_al_2010]. Latent variables can be interpreted as hypothetical constructs, traits, or *true* variables that account for the variability that induce dependence in one or more manifest variables [@Rabe_et_al_2004a]. This concept is akin to a linear mixed model, where the random effects serve to account for the variability that induces dependence within clustered outcomes [@Rabe_et_al_2004a] (refer to @sec-interlude3). The most widely known examples of latent variable models include Confirmatory Factor Analysis and Structural Equation Models (CFA and SEM, respectively).::: {.column-margin}**_Latent variables_** Variables that cannot be measured directly but are assumed to be the principal responsible for the common variability in one or more manifest variables [@Everitt_et_al_2010].:::Commonly, latent variable models consist of two parts: a measurement part and a structural part. In the measurement part, the principles of the Thurstonian model [@Thurstone_1927; @Luce_1959] are employed to aggregate one or more manifest variables and estimate a latent variable. In the structural part, regression-like relationships among latent and other manifest variables are specified, allowing researchers to test hypotheses about their (causal) relationships [@Hoyle_et_al_2014]. While the measurement part is sometimes of interest in its own right, the substantive model of interest is often defined by the structural part [@Rabe_et_al_2004a].### Importance {#sec-whylatent}It becomes evident that when an outcome is measured with error, the estimation procedures based on standard assumptions yield inefficient parameter estimates. This implies that the parameters are not estimated with sufficient precision. Consequently, such inefficiency can reduce statistical power and increase the likelihood of committing a *type II error*, which occurs when a null hypothesis is erroneously accepted [@Everitt_et_al_2010]. ::: {.column-margin}**_Type II error_**The error that results when a null hypothesis is erroneously accepted [@Everitt_et_al_2010].:::Therefore, the issue of measurement error in an outcome is highly relevant to this study. This research assumes that a speaker's (latent) potential intelligibility contributes, in part, to the observed variability in the speaker's (manifest) entropy scores. Given the interest in testing hypotheses about the potential intelligibility of speakers, and considering that the entropy scores are subject to measurement error, it becomes necessary to use latent variables to generate precise parameter estimates to test the hypothesis of interest.## Distributional departures {#sec-interlude5}### Heteroscedasticity {#sec-heteroscedasticity}In the context of regression analysis, *heteroscedasticity* occurs when the variance of an outcome depends on the values of another variable [@Everitt_et_al_2010]. The opposite case is called *homoscedasticity*. An example of heteroscedasticity can be probabilistically represented as follows:::: {.column-margin}**_Heteroscedasticity_**Occurs when the variance (standard deviation) of an outcome depends on the values of another variable. The opposite case is called *homoscedasticity* [@Everitt_et_al_2010].:::$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, \sigma_{i} ) \\\mu_{i} &= \beta \cdot x_{i} \\\sigma_{i} &= exp( \gamma \cdot x_{i} ) \\\beta &\sim \text{Uniform}( -20, 20 ) \\\gamma &\sim \text{Uniform}( -20, 20 )\end{align*}$$This representation implies that an outcome $y_{i}$ is assumed normally distributed with mean $\mu_{i}$ and a standard deviation $\sigma_{i}$. Furthermore, the mean and standard deviation of the outcome is explained by the covariate $x_{i}$, through the parameters $\beta$ and $\gamma$. Lastly, prior to observing any data, $\beta$ and $\gamma$ are assumed to be uniformly distributed in the range of $[-20,+20]$.@fig-heteroscedasticity illustrate the presence of heteroscedasticity using the previous representation, assuming a sample size of $n=100$, and parameters $\beta=0.2$ and $\gamma=1$. Notice the variability of the outcome increases as the covariate also increases. Consequently, it is easy to intuit that this difference in the outcome's variability could have and impact on the statistical hypothesis tests of $\beta$, and even in the estimate itself. To prove the intuition, an incorrect model is used to estimate $\beta$.$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &\sim \text{Uniform}( -20, 20 ) \\\end{align*}$$As a result, the hypotheses are proven accurate. When an outcome is erroneously assumed *homoscedastic*, the parameter estimates not only become inefficient but also are not estimated closer to the *true* value, as seen in the output code below and in @fig-heteroscedasticity_inference.```{r}#| label: code-heteroscedasticity#| fig-cap: ''set.seed(12345) # <1>n =100# <2>b =0.2# <3>g =1x =rnorm( n=n, mean=0, sd=1 ) # <4>mu_y = b*x # <5>s_y =exp(g*x) y =rnorm( n=n, mean=mu_y, sd=s_y ) # <6>```1. replication seed2. simulation sample size3. beta and gamma effects4. covariate simulation5. (non)linear predictor on outcome mean and standard deviation6. outcome simulation```{r}#| label: code-heteroscedasticity_inference#| fig-cap: ''# grid approximationNgp =1000# <1>b_cand =seq( from=-20, to=20, length.out=Ngp ) # <2>udf =function(i){ b_cand[i]*x } # <3>mu_y =sapply( 1:length(b_cand), udf ) # <4>udf =function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) } # <5>y_lik =sapply( 1:length(b_cand), udf ) # <6>b_prior =rep( 1/40, length(b_cand) ) # <7>b_prop = y_lik * b_prior # <8>b_post = b_prop /sum(b_prop) # <9>```1. number of points in candidate list 2. candidate list for parameter3. user defined function: linear predictor for each candidate4. calculation of the linear predictor for each candidate5. user defined function: product of individual observation likelihoods6. outcome data likelihood7. uniform prior distribution for parameter (min=-20, max=20)8. proportional posterior distribution for parameter9. posterior distribution for parameter```{r}#| label: code-heteroscedasticity_summary#| fig-cap: ''paste0( 'true beta = ', b ) # <1>b_exp =sum( b_cand * b_post ) # <2>paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )b_max = b_cand[ b_post==max(b_post) ] # <3>paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )b_var =sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) ) # <4>paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )b_prob =sum( b_post[ b_cand >0 ] ) # <5>paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )```1. true values for the parameter2. expected value for the parameter3. maximum probability value for the parameter4. standard deviation for the parameter5. probability that the parameter is greater than zero```{r}#| label: fig-heteroscedasticity#| fig-cap: 'Heteroscedasticity simulation'#| fig-height: 4#| fig-width: 5plot( x, y, xlim=c(-3,3), ylim=c(-6,6), # <1>pch=19, col=rgb(0,0,0,alpha=0.3) )abline( a=0, b=b, lty=2, col='blue')abline( a=0, b=b_exp, lty=2, col='red' )abline( a=-4, b=-1, lty=2, col=rgb(0,0,0,alpha=0.3))abline( a=4.4, b=1.5, lty=2, col=rgb(0,0,0,alpha=0.3))legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )```1. scatter plot of an heteroscedastic outcome```{r}#| label: fig-heteroscedasticity_inference#| fig-cap: 'Bayesian inference: grid approximation'#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5), # <1>main='Prior distribution',xlab=expression(beta), ylab='probability' ) abline( v=0, lty=2, col='gray' )plot( b_cand, b_post, type='l', xlim=c(-1,1), # <2>main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b, b_exp), lty=2, col=c('gray','blue', 'red') )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,2) )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plot### Outliers {#sec-outliers}In regression analysis, *outliers* are defined as observations that appear to deviate markedly from other sample data points in which they occur [@Everitt_et_al_2010]. Although no unique probabilistic representation of *outliers* can be represented, a simple example can be illustrated with @fig-outliers. The figure depicts the presence of three influential observations in the outcome (colored blue). It is easier to intuit that with the presence of influential observations the parameter estimates, and the hypothesis test resulting from them, can be affected.::: {.column-margin}**_Outlier_**Observation that appear to deviate markedly from other sample data points in which it occurs [@Everitt_et_al_2010].:::The intuition is proven correct when $\beta$ is estimated using the same *incorrect* model used in @sec-heteroscedasticity. When an outcome is erroneously assumed without outliers, the parameter value is estimated farther from the truth, as observed in the code output below and in @fig-outliers_inference.$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &\sim \text{Uniform}( -20, 20 ) \\\end{align*}$$```{r}#| label: code-outliers#| fig-cap: ''set.seed(12345) # <1>n =100# <2>b =0.2# <3>x =rnorm( n=n, mean=0, sd=1 ) # <4>mu_y = b*x # <5>y =rnorm( n=n, mean=mu_y, sd=1 ) # <6>idx =which( x>1 ) # <7>sel =1:3y[idx[sel]] =6```1. replication seed2. simulation sample size3. beta effects4. covariate simulation5. linear predictor on outcome mean6. outcome simulation7. outlier simulation```{r}#| label: code-outliers_inference#| fig-cap: ''# grid approximationNgp =1000# <1>b_cand =seq( from=-20, to=20, length.out=Ngp ) # <2>udf =function(i){ b_cand[i]*x } # <3>mu_y =sapply( 1:length(b_cand), udf ) # <4>udf =function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) } # <5>y_lik =sapply( 1:length(b_cand), udf ) # <6>b_prior =rep( 1/40, length(b_cand) ) # <7>b_prop = y_lik * b_prior # <8>b_post = b_prop /sum(b_prop) # <9>```1. number of points in candidate list 2. candidate list for parameter3. user defined function: linear predictor for each candidate4. calculation of the linear predictor for each candidate5. user defined function: product of individual observation likelihoods6. outcome data likelihood7. uniform prior distribution for parameter (min=-20, max=20)8. proportional posterior distribution for parameter9. posterior distribution for parameter```{r}#| label: code-outliers_summary#| fig-cap: ''paste0( 'true beta = ', b ) # <1>b_exp =sum( b_cand * b_post ) # <2>paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )b_max = b_cand[ b_post==max(b_post) ] # <3>paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )b_var =sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) ) # <4>paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )b_prob =sum( b_post[ b_cand >0 ] ) # <5>paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )```1. true values for the parameter2. expected value for the parameter3. maximum probability value for the parameter4. standard deviation for the parameter5. probability that the parameter is greater than zero```{r}#| label: fig-outliers#| fig-cap: 'Outliers simulation'#| fig-height: 4#| fig-width: 5plot( x, y, xlim=c(-3,3), ylim=c(-6,6), # <1>pch=19, col=rgb(0,0,0,alpha=0.3) )points( x[idx[sel]], y[idx[sel]], # <1>pch=19, col=rgb(0,0,1,alpha=0.3) )abline( a=0, b=b, lty=2, col='blue')abline( a=0, b=b_exp, lty=2, col='red' )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )```1. scatter plot of an outcome with outliers```{r}#| label: fig-outliers_inference#| fig-cap: 'Bayesian inference: grid approximation'#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5), # <1>main='Prior distribution',xlab=expression(beta), ylab='probability' ) abline( v=0, lty=2, col='gray' )plot( b_cand, b_post, type='l', xlim=c(-1,1), # <2>main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b, b_exp), lty=2, col=c('gray','blue', 'red') )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,2) )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plot### SolutionAs recommended by McElreath [-@McElreath_2020], *robust models* can be used to deal with these types of distributional departures. *Robust models* are a general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate departures of the data from the model's assumptions [@Everitt_et_al_2010]. The procedure consist on modifying the statistical models to include traits that effectively make them *robust* to small departures from the distributional assumption, like heteroscedastic errors or to the presence of *outliers*.::: {.column-margin}**_Robust models_**A general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model for [@Everitt_et_al_2010].:::### Importance {#sec-whyrobust}It is known that dealing with *heteroscedasticity* and the identification of *outlier* through preliminary univariate procedures is prone to the erroneous transformation or exclusion of valuable information. This can ultimately *bias* the parameter estimates, and even make them inefficient [@McElreath_2020]. *Bias* refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated [@Everitt_et_al_2010].::: {.column-margin}**_Bias_**It refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated [@Everitt_et_al_2010].:::Dealing with the possibility of heteroscedasticity or outlying observations is relevant to the present study, because there is an interest in testing hypotheses about the potential intelligibility of speakers. Therefore, it is a necessity to considering the possibility of using robust regression models to assess these distributional departures and generate unbiased parameter estimates.# Introduction {#sec-introduction}Intelligibility is at the core of successful, felicitous communication. Thus, being able to speak intelligibly is a major achievement in language acquisition and development. Furthermore, intelligibility is considered to be the most practical index to assess competence in oral communication [@Kent_et_al_1994]. Consequently, it serves as a key indicator for evaluating the effectiveness of various interventions like speech therapy or cochlear implantation [@Chin_et_al_2012]. *Speech intelligibility* refers to the extent to which a listener can accurately recover the elements in an acoustic signal produced by a speaker, such as phonemes or words [@Freeman_et_al_2017; @vanHeuven_2008; @Whitehill_et_al_2004]. Studies that investigate *intelligibility* have utilized entropy scores to examine differences in children’s intelligibility, particularly between those with normal hearing and those with cochlear implants [@Boonen_et_al_2021].::: {.column-margin}**_Speech intelligibility_**The extent to which a listener can accurately recover the elements in an acoustic signal produced by a speaker, such as phonemes or words [@Freeman_et_al_2017; @vanHeuven_2008; @Whitehill_et_al_2004].:::However, despite their potential as a fine-grained metric of intelligibility, as proposed by Boonen et al. [-@Boonen_et_al_2021], they exhibit a statistical complexity that cautions researchers against treating them as straightforward indices of intelligibility. This complexity emerges from the processes of data collection and transcription aggregation, endowing the scores with four distinctive features: boundedness, measurement error, clustering, and the possible presence of outliers and heteroscedasticity. Firstly, entropy scores are confined to an interval between zero and one, a phenomenon known as boundedness (refer to @sec-interlude2). Secondly, entropy scores are a manifestation of a speaker’s intelligibility, with this intelligibility being the primary factor influencing the observed scores. This issue is commonly referred to as measurement error (refer to @sec-interlude4). Thirdly, due to the repeated assessment of speakers through multiple speech samples, the scores exhibit clustering (refer to @sec-interlude3). Lastly, driven by the specific set of speakers and speech samples under scrutiny, these scores often display a potential for the presence of outliers and heteroscedasticity (refer to @sec-interlude5). Failure to collectively address these data features can result in numerous statistical challenges that might hamper the researcher’s ability to investigate intelligibility. Notably, neglecting boundedness can, at best, lead to underfitting and, at worst, to misspecification. Underfitting can cause the generation of inconsistent predictions, thus hindering the model’s ability to generalize when confronted with new data. Conversely, misspecification can lead to inconsistent and less efficient parameter estimates (refer to @sec-whybeta). Additionally, overlooking issues such as measurement error, clustering, outliers or heteroscedasticity can lead to biased and less precise parameter estimates, ultimately diminishing the statistical power of models and increasing the likelihood of committing type I or type II errors when addressing research inquiries (refer to @sec-whylatent, @sec-whyGLMM, and @sec-whyrobust). In the realm of computational statistics and data analysis, several models have been developed to address some of these data features individually and, at times, collectively. All of these models have found moderate adoption in various fields, including speech communication, psychology, education, health care, chemistry, and policy analysis. Specifically, in the domain of speech communication, Boonen et al. [-@Boonen_et_al_2021] addressed data clustering within the context of intelligibility research. Conversely, de Brito Trindade et al. [-@de_Brito_et_al_2021] and Kangmennaang et al. [-@Kangmennaang_et_al_2023] concentrated on tackling non-normal bounded data with measurement error in covariates, within the context of chemical reactions and health care access, respectively. Remarkably, despite these individual efforts, there is, to the best of the authors' knowledge, no study comprehensively addressing all of these data features in a principled way while also transparently and systematically documenting the Bayesian estimation of the resulting statistical models. # Research questions {#sec-RQs}Considering the imperative need to comprehensively address all data features when investigating unobservable and complex traits, this investigation aims to demonstrate the efficacy of the Generalized Linear Latent and Mixed Model (GLLAMM) in handling entropy scores features when exploring research theories concerning speech intelligibility. To achieve this objective, the study will reexamine data originating from transcriptions of spontaneous speech samples, initially collected by Boonen et al. [-@Boonen_et_al_2021]. Subsequently, this data will be aggregated into entropy scores and subjected to modeling through the Bayesian Beta-proportion GLLAMM. To address the primary objective, the study poses three key research questions. First, given the importance of accurate predictions in developing useful practical models and testing research theories [@Shmueli_et_al_2011], `Research Question 1 (RQ1)` assesses whether the Beta-proportion GLLAMM yields more accurate predictions than the more prevalent Normal Linear Mixed Model (LMM) [@Holmes_et_al_2019]. Second, acknowledging that intelligibility is an unobservable, intricate concept and a key indicator of oral communication competence [@Kent_et_al_1994], `Research Question 2 (RQ2)` investigates how the proposed model can estimate speakers' latent intelligibility from manifest entropy scores. Thirdly, recognizing that research involves developing and comparing theories, `Research Question 3 (RQ3)` illustrates how these research theories can be examined within the model's framework. Specifically, `RQ3` assesses the influence of speaker-related factors on the newly estimated latent intelligibility. The findings of this study will equip researchers investigating speech intelligibility using entropy scores, or those grappling with similar data challenges, with a statistical tool that improves upon existing research models. The tool will provide an assessment of the predictability of empirical phenomena, along with the capability to develop a quantitative measure for the latent variable of interest. The latter, in turn, could facilitate the appropriate comparison of existing theories related to the latent variable, and even the development of new ones. # Data {#sec-data}The data comprised the transcriptions of spontaneous speech samples originally collected by Boonen et al. [-@Boonen_et_al_2021]. The data is not publicly available due to privacy restrictions. Nonetheless, the data can be provided by the corresponding author upon reasonable request. ## Speakers Boonen et al. [-@Boonen_et_al_2021] selected $32$ speakers, comprising $16$ normal hearing children (`NH`) and $16$ hearing-impaired children with cochlear implants (`HI/CI`). At the time of the collection of the speech samples, the `NH` group were between $68$ and $104$ months old ($M = 86.3$, $SD = 9.0$), while `HI/CI` group were between $78$ and $98$ months old ($M = 86.3$, $SD = 6.7$). ## Speech samples Boonen and colleagues selected speech samples from a large corpus of children’s spontaneously spoken speech recordings. These recordings were obtained as the children narrated a story prompted by the picture book "Frog, Where Are You?" [@Mayer_1969] to a caregiver ‘unfamiliar with the story’. Before recording, the children were allowed to skim over the booklet and examine pictures. Prior to the selection process, the recordings were orthographically transcribed using the CHAT format in the CLAN editor [@MacWhinney_2020]. These transcriptions were exclusively used in the curation of appropriate speech samples. To ensure the quality of the selection, Boonen and colleagues excluded sentences containing syntactically ill-formed or incomplete statements, background noise, crosstalk, long hesitations, revisions, or non-words. Finally, ten speech samples were randomly chosen for each of the $32$ selected speakers. Each of these samples comprised a single sentence with a length of three to eleven words ($M = 7.1$, $SD = 1.1$). The process resulted in a total of $320$ selected sentences collectively comprising $2,263$ words. ::: {.column-margin}**_Speech samples_**Sentences with a length of three to eleven words ($M = 7.1$, $SD = 1.1$).:::## Listeners Boonen and colleagues recruited $105$ students from the University of Antwerp. All participants were native speakers of Belgian Dutch and reported no history of hearing difficulties or prior exposure to the speech of hearing-impaired speakers. ## Transcription task {#subsec-transcriptions}The $320$ speech samples and $105$ listeners were randomly assigned to five blocks, with each block consisting of approximately $21$ listeners who transcribed $64$ sentences presented in random order. This resulted in a total of $47,514$ transcribed words from the original $2,263$ words present in the speech samples. These orthographic transcriptions were automatically aligned with a python script at the sentence level, in a column-like grid structure like the one presented in @tbl-alignment. This alignment process was repeated for each sentence within each speaker and block, and the output was manually checked and adjusted (if needed) in order to appropriately align the words. For more details on the random assignment and alignment procedures refer to Boonen et al. [-@Boonen_et_al_2021]. ## Entropy calculation {#subsec-entropy_calculation}Next, this study aggregated the aligned transcriptions by listener yielding $2,2634$ entropy scores, one score per word. The entropy scores were calculated following Shannon’s formula [-@Shannon_1948]:::: {.column-margin}**_Entropy formula_**:::$$ \begin{equation}H_{wsib} = \frac{ \sum_{k=1}^{K} p_{k} \cdot log_{2}(p_{k}) }{ log_{2}(J)}\end{equation}$$ {#eq-entropy}where $H_{wsib}$ denotes the entropy scores confined to an interval between zero and one, with $w$ defining the word index, $s$ the sentence index, $i$ the speaker index, and $b$ the block index. Moreover, $K$ describes the number of different word types within transcriptions, and $J$ defines the total number of word transcriptions. Notice that by design, the total number of word transcriptions $J$ corresponds with the number of listeners per block, i.e., $21$ listeners. Lastly, $p_{k} = \sum_{j=1}^{J} 1(T_{jk}) / J$ denotes the proportion of word types within transcriptions, with $1(T_{jk})$ describing an indicator function that takes the value of one when the word type $k$ is present in the transcription $j$.These entropy scores served as the outcome variable, capturing agreement or disagreement among listeners’ word transcriptions. Lower scores indicated a higher degree of agreement between transcriptions and therefore higher intelligibility, while higher scores indicated lower intelligibility, due to a lower degree of agreement in the transcriptions [@Boonen_et_al_2021; @Faes_et_al_2021]. Furthermore, no score is excluded from the modeling process using univariate procedures, rather, the identification of highly influential observations is performed within the context of the proposed models, as recommended by McElreath [-@McElreath_2020] (refer to @sec-interlude5).::: {.column-margin}**_Entropy interpretation_**Lower scores indicated a higher degree of agreement between transcriptions and therefore higher intelligibility, while higher scores indicated lower intelligibility, due to a lower degree of agreement in the transcriptions [@Boonen_et_al_2021; @Faes_et_al_2021]:::+---------------+---------+---------+---------+---------+---------+| Transcription | Words | | | | |+---------------+---------+---------+---------+---------+---------+| Number | 1 | 2 | 3 | 4 | 5 |+:=============:+:=======:+:=======:+:=======:+:=======:+:=======:+| 1 | de | jongen | ziet | een | kikker |+---------------+---------+---------+---------+---------+---------+| | the | boy | sees | a | frog |+---------------+---------+---------+---------+---------+---------+| 2 | de | jongen | ziet | de | [X] |+---------------+---------+---------+---------+---------+---------+| | the | boy | sees | the | [X] |+---------------+---------+---------+---------+---------+---------+| 3 | de | jongen | zag | [B] | kokkin |+---------------+---------+---------+---------+---------+---------+| | the | boy | saw | [B] | cook |+---------------+---------+---------+---------+---------+---------+| 4 | de | jongen | zag | geen | kikkers |+---------------+---------+---------+---------+---------+---------+| | the | boy | saw | no | frogs |+---------------+---------+---------+---------+---------+---------+| 5 | de | hond | zoekt | een | [X] |+---------------+---------+---------+---------+---------+---------+| | the | dog | searches| a | [X] |+---------------+---------+---------+---------+---------+---------+| **Entropy** | $0$ |$0.3109$ |$0.6555$ |$0.8277$ |$1$ |+---------------+---------+---------+---------+---------+---------+: Hypothetical alignment of word transcriptions and entropy scores. Note: Extracted from Boonen et al. [-@Boonen_et_al_2021], and slightly modified for illustrative purposes. Entropy scores are calculated the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners $\left( s=1, i=1, b=1, J=5 \right)$. Transcriptions are in Dutch with English translation. *[B]* represent a blank space, and *[X]* an unidentifiable speech. {#tbl-alignment .striped .hover}In this context, it is relevant to exemplify the entropy calculation procedure. For that purpose, the words in position two, four and five observed in @tbl-alignment were used. These words were assumed present in the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners ($w=\{2,4,5\}$, $s=1$, $i=1$, $b=1$, $J=5$). For the word $2$, the first four listeners identified the word type *jongen* $(T_{j1})$, while the last identified the word type *hond* $(T_{j2})$. Therefore, two word types were identified ($K=2$), with proportions equal to $\{ p_{1}, p_{2} \} = \{ 4/5, 1/5 \} = \{ 0.8, 0.2 \}$, and entropy score equal to:$$ H_{2111} = \frac{ 0.8 \cdot log_{2}(0.8) + 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.3109$$For the word $4$, two listeners identified the word type *een* $(T_{j1})$, one listener the word type *de* $(T_{j2})$, and another the word *geen* $(T_{j3})$. A blank space *[B]* is a symbol that defines the absence of a word in a space where a word is expected, as compared with other transcriptions, during the alignment procedure. Notice that for calculation purposes, because the blank space is not expected in such position, this is considered as a different word type. Consequently four word types were registered ($K=4$), with proportions equal to $\{ p_{1}, p_{2}, p_{3}, p_{4} \} = \{ 2/5, 1/5, 1/5, 1/5 \} = \{ 0.4, 0.2, 0.2, 0.2 \}$ and entropy score equal to:$$ H_{4111} = \frac{ 0.4 \cdot log_{2}(0.4) + 3 \cdot 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.8277$$Lastly, for word $5$, each listener transcribed a different word. it is important to highlight that when a listener does not identify a complete word, or part of it, (s)he is instructed to write *[X]* in that position. However, for the calculation of the entropy score, if more than one listener marks an unidentifiable word with *[X]*, each one of them is considered a different word type. This is done to avoid the artificial reduction of the entropy score, as *[X]* values already indicate the word’s lack of intelligibility. . Consequently, five word types were observed, $T_{j1}=$*kikker*, $T_{j2}=$*[X]*, $T_{j3}=$*kokkin*, $T_{j4}=$*kikkers*, $T_{j5}=$*[X]* ($K=5$), with proportions equal to $\{ p_{1}, p_{2}, p_{3}, p_{4}, p_{5} \} = \{ 1/5, 1/5, 1/5, 1/5, 1/5 \} = \{ 0.2, 0.2, 0.2, 0.2, 0.2 \}$, and entropy score equal to:$$ H_{5111} = \frac{ 5 \cdot 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} = 1$$## Exploring the data {#subsec-data_exploration}```{r}#| label: code-usd#| fig-cap: ''#| echo: false# load functionsmain_dir ='/home/josema/Desktop/1. Work/1 research/PhD Antwerp/#thesis/paper1'source( file.path( main_dir, 'real_code', '0_sim_extra.R') )``````{r}#| label: code-data_load#| fig-cap: ''#| echo: falsedata_dir ='/home/josema/Desktop/1. Work/1 research/PhD Antwerp/#thesis/#data/final'data_nam ="data_H_list.RData"model_data =file.path(data_dir, data_nam )load( model_data )# str(dlist)var_int =c('bid','cid','uid','wid','HS','A','Am','Hwsib')data_H =data.frame(dlist[var_int])# str(data_H)```As expected, the data exploration reveals from the start two significant features of the entropy scores: *clustering* and *boundedness* (refer to @sec-whybeta and @sec-whyGLMM). In the case of the entropy scores, clustering arises due to the presence of various word-level scores generated for numerous sentences, originated from different speakers and evaluated in different blocks (see code output below, depicting the first ten observations of the data). On the other hand, entropy scores exhibit *boundedness* as they can only take on values within the continuous interval between zero and one, particularly $H_{wsib} \in [0,1]$ (see @fig-entropy_data showing three randomly selected speakers).```{r}#| label: code-dataframe#| fig-cap: ''var_int =c('bid','cid','uid','wid','HS','A','Am','Hwsib') # <1>head( data_H[, var_int], 10 ) # <2>```1. selecting variables of interest2. showing the first 10 observations of the data```{r}#| label: fig-entropy_data#| fig-cap: 'Entropy scores distribution: all sentences of selected speakers'#| fig-height: 7#| fig-width: 10require(rethinking) # <1>speakers =c(20,8,11, 25,30,6) # <2>par(mfrow=c(2,3))for( i in speakers ){ # <3> speaker = data_H$cid == i dat =binning( y=data_H$Hwsib[speaker], min_y=0, max_y=1, n_bins=20, dens=T )plot(dat, ylab="Frequency-Density", ylim=c(-0.15,max(dat)), xaxt='n',xlim=c(-0.05,1.05), xlab='entropy', col=rgb(0,0,0,0.6) )abline( h=0, col='gray' )abline( v=c(0,1), lty=2, col=rgb(0,0,0,0.3) )axis( side=1, at=as.numeric(names(dat)),labels=names(dat), las=2, cex.axis=0.8 )mtext( text=paste0('Speaker ', i), side=3, adj=0, cex=1.1)}par(mfrow=c(1,1))```1. package requirement2. selection of speakers3. density plot for all sentences of speakerIDAdditionally, the data shows the $320$ speakers' speech samples consists of sentences with a minimum of $3$ and a maximum of $11$ words per sentence ($M=7.1$, $SD=1.1$), where most of the speech samples have between $5$ and $9$ words per sentence (see @fig-speech_samples). <!-- The minimum is observed in the third sentence for speaker $6$ and the maximum in the second sentence for speaker $9$, respectively. -->```{r}#| label: code-speech_samples1#| fig-cap: ''speech_samples =with(data_H, table(cid, uid) ) # <1>speech_samples```1. report speech samples per speaker and sentence ```{r}#| label: fig-speech_samples#| fig-cap: 'Histogram of words per sentences in the speech samples'#| fig-height: 4#| fig-width: 5speech_samples =data.frame( speech_samples )hist(speech_samples$Freq, breaks=20, xlim=c(2, 12), # <1>main='', xlab='words per sentence')```1. histogram of words per sentences```{r}#| label: code-speech_samples2#| fig-cap: ''psych::describe( speech_samples$Freq ) # <1>```1. statistical descriptors for the speech samplesMoreover, the data comprised $16$ normal hearing children (`NH`, hearing status category $1$) and $16$ hearing impaired children, with cochlear implant (`HI/CI`, hearing status category $2$). At the time of the collection of the speech samples, the `NH` group were between $68$ and $104$ months old ($M=86.3$, $SD=9.0$), while `HI/CI` group were between $78$ and $98$ months old ($M=86.3$, $SD=6.7$).```{r}#| label: code-A_HS#| fig-cap: ''d_mom =unique( data_H[,c('cid','HS','A')]) # <1>with( d_mom, table( A, HS ) ) # <2>```1. unique hearing status and chronological age per speaker2. number of speakers per chronological age and hearing statusLastly, before fitting the models using Bayesian inference, the data was formatted as a list including all necessary information for the fitting process:```{r}#| label: code-datalist#| fig-cap: ''dlist =list(N =nrow(data_H), # <1>B =max(data_H$bid), # <2>I =max(data_H$cid), # <3>U =max(data_H$uid), # <4>W =max(data_H$wid), # <5>cHS =max(data_H$HS), # <6>bid = data_H$bid, # <7>cid = data_H$cid, # <8>uid = data_H$uid, # <9>wid = data_H$wid, # <10>HS = data_H$HS, # <11>A = data_H$A, # <12>Am =with( data_H, A -min(A) ), # <13>Hwsib = data_H$Hwsib # <14>)str(dlist)```1. Number of observations2. Maximum number of blocks3. Maximum number of speakers4. Maximum number of sentences5. Maximum number of words6. Maximum number of categories in hearing status7. Data block ID8. Data speaker ID9. Data sentence ID10. Data word ID11. Data hearing status11. Data chronological age13. Data chronological age (centered)14. Data entropy score# Methods {#sec-methods}This section articulates the probabilistic formalism of both the Normal LMM and the proposed Beta-proportion GLLAMM. Subsequently, it details the set of fitted models and the estimation procedure, along with the criteria employed to assess the quality of the Bayesian inference results. Lastly, the section outlines the methodology employed for model comparison.## Statistical models {#sec-models}### Normal LMM {#sec-normal_LMM}The general mathematical formalism of the Normal LMM posits that the likelihood of the (manifest) entropy scores $H_{wsib}$ follows a normal distribution, i.e.$$\begin{align}H_{wsib} & \sim \text{Normal} \left( \mu_{sib}, \sigma_{i} \right)\end{align} $$ {#eq-normal_LMM_likelihood}where $\mu_{sib}$ represents the average entropy at the word-level and $\sigma_{i}$ denotes the standard deviation of the average entropy at the word-level, varying for each speaker. Given the clustered nature of the data, $\mu_{sib}$ is defined by the linear combination of individual characteristics and several random effects:$$\begin{align}\mu_{sib} &= \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + u_{si} + e_{i} + a_{b}\end{align} $$ {#eq-normal_LMM_linearpred}where $HS_{i}$ and $A_{i}$ denote the hearing status and chronological age of speaker $i$, respectively. Additionally, $\alpha$ denote the general intercept, $\alpha_{HS[i]}$ represents the average entropy for each hearing status group, and $\beta_{A,HS[i]}$ denotes the evolution of the average entropy per unit of chronological age $A_{i}$ for each hearing status group. Furthermore, $u_{si}$ denotes the sentence-speaker random effects measuring the unexplained entropy variability within sentences for each speaker, $e_{i}$ denotes the speaker random effects describing the unexplained entropy variability between speakers, and $a_{b}$ denotes the block random effects assessing the unexplained variability between experimental blocks. Several notably features of the Normal LLM can be discerned from the equations. Firstly, @eq-normal_LMM_likelihood indicates that the variability of the average entropy at the word-level can differ for each speaker, enhancing the model's *robustness* to mild or moderate data departures from the normal distribution assumption, such as heteroscedasticity or outliers (refer to @sec-interlude5). Secondly, @eq-normal_LMM_linearpred reveals that the model assumes no transformation is applied to the relationship between the average entropy and the linear predictor. This is commonly known as a direct link function. Moreover, @eq-normal_LMM_linearpred indicates that chronological age is *centered* around the minimum chronological age in the sample $\bar{A}$. The *centering* procedure is employed to prevent the interpretation of parameters outside the range of chronological ages available in the data [@Everitt_et_al_2010]. Lastly, the equation implies the model considers separate intercept and separate slope of age for each hearing status group, i.e., `NH` and `HI/CI` speakers::: {.column-margin}**_Centering_** Procedure use to facilitate the interpretation of regression parameters [@Everitt_et_al_2010].:::### Beta-proportion GLLAMM {#sec-beta_GLLAMM}The general mathematical formalism of the proposed Beta-proportion GLLAMM comprises four components: a response model, with its likelihood, linear predictor, and link function, and a structural model. The response model posits the likelihood of entropy scores follow a beta-proportion distribution,::: {.column-margin}**_GLLAMM components_**1. Response model, likelihood2. Response model, linear predictor3. Response model, link function4. Structural equation model:::$$\begin{align}H_{wsib} & \sim \text{BetaProp} \left( \mu_{ib}, M_{i} \right)\end{align} $$ {#eq-beta_GLLAMM_likelihood}where$\mu_{ib}$ denotes the average entropy at the word-level and $M_{i}$ signifies the *dispersion* of the average entropy at the word-level, varying for each speaker. Additionally, $\mu_{ib}$ is defined as, $$\begin{align}\mu_{ib} &= \text{logit}^{-1}[ a_{b} - SI_{i} ]\end{align} $$ {#eq-beta_GLLAMM_linpred}where $\text{logit}^{-1}(x) = exp(x) / (1+exp(x))$ is the inverse-logit link function, $a_{b}$ denotes the block random effects, and $SI_{i}$ describes the speaker's latent *potential intelligibility*. Conversely, the structural equation model relates the speakers' latent potential intelligibility to the individual characteristics:$$\begin{align}SI_{i} = \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i}\end{align} $$ {#eq-beta_GLLAMM_structural}where $\alpha$ defines the general intercept, $\alpha_{HS[i]}$ denotes the potential intelligibility for different hearing status groups, and $\beta_{A,HS[i]}$ indicates the evolution of potential intelligibility per unit of chronological age for each hearing status group. Furthermore, $e_{i}$ represents speakers block effects, describing unexplained potential intelligibility variability between speakers, and $u_{i} = \sum_{s=1}^{S} u_{si}/S$ denotes sentence random effects, assessing the average unexplained potential intelligibility variability among sentences within each speaker, with $S$ denoting the total number of sentences per speaker.Several features are evident in this probabilistic representations. Firstly, akin to the Normal LMM, @eq-beta_GLLAMM_likelihood reveals that the *dispersion* of average entropy at the word level can differ for each speaker. This enhances the model's robustness to mild or moderate data departures from the beta-proportion distribution assumption (refer to @sec-interlude5). Secondly, in contrast with the Normal LMM, @eq-beta_GLLAMM_linpred shows the potential intelligibility of a speakers has a negative non-linear relationship with the entropy scores, explicitly highlighting the inverse relationship between intelligibility and entropy. This feature also maps the unbounded linear predictor to the bounded limits of the entropy scores. Thirdly, in contrast with the Normal LMM, @eq-beta_GLLAMM_structural demonstrates that the structural parameters are interpretable in terms of the latent potential intelligibility scores, where the scale of the latent trait is set by the general intercept $\alpha$, as required in latent variable models [@Depaoli_2021]. Furthermore, the equation implies the model also considers separate intercept and separate slope of age for each hearing status group, i.e., `NH` and `HI/CI` speakers. Additionally, @eq-beta_GLLAMM_structural indicates that chronological age is *centered* around the minimum chronological age in the sample $\bar{A}$. Lastly, the same equation assumes the intelligibility scores have two sources of unexplained variability: $e_{i}$ and $u_{i}$. The former represents inherent differences in potential intelligibility among different speakers, while the latter assumes that different sentences measure potential intelligibility differently due to variations in word difficulties and their interplay within the sentence.## Prior distributions {#sec-priors}Bayesian procedures require the incorporation of priors (refer to @sec-interlude1). This study establishes priors and hyperpriors for the parameters of both the Normal LMM and the Beta-proportion GLLAMM using *prior predictive simulations*. This procedure entails the semi-independent simulation of parameters, which are subsequently transformed into simulated data values according to the models' specifications. This procedure aims to establish meaningful priors and comprehend its implications within the context of the model before incorporating any information derived from empirical data [@McElreath_2020].::: {.column-margin}**_Prior predictive simulations_** Procedure that entails the semi-independent simulation of parameters, which are subsequently transformed into simulated data values according to the models' specifications. The procedure aims to establish meaningful priors and comprehend its implications within the context of the model before incorporating any information derived from empirical data [@McElreath_2020].:::### Normal LMM {#sec-prior_normal_LMM}For the parameters of the Normal LMM, non-informative priors and hyperpriors are established to align with analogous model assumptions in frequentist methods (refer to @sec-prior_effects). The specified priors are as follows: #### Standard deviation $\sigma_{i}$As described in @sec-fitted, the models initially consider one $\sigma$ prior for all the speakers. This choice implies that the presumed uncertainty for the unexplained variability of the average entropy at the word-level is the same for all speakers, prior to the observation of empirical data.$$\begin{align}\sigma_{i} &\sim \text{Exponential}\left( 2 \right)\end{align}$$ {#eq-normal_LMM_sigma_prior1}The left panel of @fig-sigma_prior_normalLMM1 shows the weakly informative prior expects $\sigma$ to be possible only in a positive range, as it is required for variability parameters [@Depaoli_2021]. Furthermore, the right panel of @fig-sigma_prior_normalLMM1 shows that when transformed to the entropy scale, the model expect predictions to fall beyond the feasible range of the outcome.```{r}#| label: fig-sigma_prior_normalLMM1#| fig-cap: 'Normal LMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>param_pscale =rexp(n=n, rate=2 ) # <3>param_oscale =rnorm( n=n, mean=0.5, sd=param_pscale ) # <4>par(mfrow=c(1,2))dens( param_pscale, xlim=c(0,3), # <5>show.HPDI=0.95,main='Parameter scale', xlab=expression(sigma))dens( param_oscale, xlim=c(0,1), # <6>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. parameter scale4. entropy scale5. density plot: unrestricted parameter scale6. density plot: bounded entropy scaleFurthermore, as described in @sec-normal_LMM and @sec-fitted, there is a possibility that the model considers one $\sigma_{i}$ prior for each of the speakers in the data. This choice implies that the presumed uncertainty about unexplained variability of the average entropy at the word-level is similar for each speaker, prior to observing empirical data. In this case the parameters are defined in terms of hyperpriors (refer to @sec-hyperpriors). $$\begin{align}r_{S} &\sim \text{Exponential}\left( 2 \right) \\\sigma_{i} &\sim \text{Exponential}\left( r_{S} \right)\end{align} $$ {#eq-normal_LMM_sigma_prior2}The left panel of @fig-sigma_prior_normalLMM2 shows the weakly informative prior expects $\sigma_{i}$ to be possible only in a positive range, as it is required for variability parameters [@Depaoli_2021]. The panel also shows the parameters are more likely to happen in the interval of $[0, 2.5]$. Moreover, the right panel of @fig-sigma_prior_normalLMM2 shows that when the prior is transformed to the entropy scale, the model expect scores to fall beyond the feasible range of the outcome. ```{r}#| label: fig-sigma_prior_normalLMM2#| fig-cap: 'Normal LMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_s =rexp(n=n, rate=2 ) # <3>z_s =rexp(n=n, rate=1 )param_pscale = r_s * z_s # <4>param_oscale =rnorm( n=n, mean=0.5, sd=param_pscale ) # <5>par(mfrow=c(1,2))dens( param_pscale, xlim=c(0,3), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(sigma[i]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. density plot: unrestricted parameter scale7. density plot: bounded entropy scale#### Intercepts $\alpha$This parameter is used in preliminary models where no mathematical formulations regarding how speaker-related factors influence intelligibility are investigated. The prior distribution for $\alpha$ under the Normal LMM is described in @eq-alpha_prior_normal. $$\begin{align}\alpha &\sim \text{Normal} \left( 0, 0.05 \right)\end{align} $$ {#eq-alpha_prior_normal}The left panel of @fig-alpha_prior_normalLMM show the prior is an narrowly concentrated around zero. Moreover, the right panel of @fig-alpha_prior_normalLMM, demonstrate that when the parameter is transformed to the entropy scale, the model anticipates entropy scores at low levels of the feasible range of the outcome. This implies that particular bias in entropy scores towards lower values are expected by prior.```{r}#| label: fig-alpha_prior_normalLMM#| fig-cap: 'Normal LMM, general intercept prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_s =rexp(n=n, rate=2 ) # <3>z_s =rexp(n=n, rate=1 )param_hscale = r_s * z_sparam_pscale =rnorm( n=n, mean=0, sd=0.05 ) # <4> param_oscale =rnorm(n=n, mean=param_pscale, sd=param_hscale) # <5>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-0.5,0.5), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(M))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale#### Hearing status effects $\alpha_{HS[i]}$The prior distribution for the Normal LMM is described in @eq-alphaHS_prior_normal. Notably, the same prior is applied to both two hearing status categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.$$\begin{align}\alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.2 \right)\end{align} $$ {#eq-alphaHS_prior_normal}The left panel of @fig-alphaHS_prior_normalLMM reveal a weakly informative prior that restricts the range of probability of $\alpha_{HS[i]}$ between $[0.3, 0.7]$. This implies that no particular bias towards entropy values above or below $0.5$ for different hearing status groups is present in the priors. However, the right panel of @fig-alphaHS_prior_normalLMM demonstrate that when the prior is transformed to the entropy scale, the model anticipates a concentration of data around low levels of entropy, but also beyond the feasible range of the outcome. ```{r}#| label: fig-alphaHS_prior_normalLMM#| fig-cap: 'Normal LMM, hearing status effects prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_s =rexp(n=n, rate=2 ) # <3>z_s =rexp(n=n, rate=1 )param_hscale = r_s * z_sparam_pscale =rnorm( n=n, mean=0, sd=0.2 ) # <4> param_oscale =rnorm( n=n, mean=param_pscale, sd=param_hscale ) # <5>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <6>show.HPDI=0.95,main='Parameter scale',xlab=expression(alpha[HS]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale#### Chronological age per hearing status $\beta_{A,HS[i]}$The prior distribution for the Normal LMM is described in @eq-betaHS_prior_normal. Notably, the same prior is applied to both two hearing status categories. This choice implies that the evolution of entropy attributed to chronological age between the categories is presumed to have similar uncertainties prior to the observation of empirical data.$$\begin{align}\beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right)\end{align} $$ {#eq-betaHS_prior_normal}The left panel of @fig-betaHS_prior_normalLMM shows the prior restricts $\beta_{A,HS[i]}$ to be mostly within the range of $[-0.4, 0.4]$. This implies that there is no particular bias towards a positive or negative evolution of entropy scores due to chronological age per hearing status group. However, the right panel of @fig-betaHS_prior_normalLMM show that when this prior is transformed to the entropy scale, the model anticipate a concentration of entropy values at lower levels, but it also expects entropy scores significantly beyond the feasible range of the outcome.```{r}#| label: fig-betaHS_prior_normalLMM#| fig-cap: 'Normal LMM, chonological age per hearing status effects prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_s =rexp(n=n, rate=2 ) # <3>z_s =rexp(n=n, rate=1 )param_hscale = r_s * z_sparam_pscale =rnorm( n=n, mean=0, sd=0.1 ) # <4>usd =function(i){ param_pscale * data_H$Am[i] } # <5>param_mscale =sapply( 1:length(data_H$Am), usd ) # <6>param_oscale =rnorm( n=n, mean=param_pscale, sd=param_hscale ) # <7>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-0.5,0.5), # <8>show.HPDI=0.95,main='Parameter scale', xlab=expression(beta[AHS]))dens( param_oscale, xlim=c(0,1), # <9>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. user defined function6. parameter scale7. entropy scale8. unrestricted parameter scale9. bounded entropy scale#### speaker differences $e_{i}$The prior distribution of $e_{i}$ for the Normal LMM is described in @eq-ei_prior_normal. The same prior is assigned to each speaker in the sample. This choice implies that differences in entropy scores between speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters. In this case the parameters are defined in terms of hyperpriors (refer to @sec-hyperpriors). $$\begin{align}m_{i} &\sim \text{Normal} \left( 0, 0.05 \right) \\s_{i} &\sim \text{Exponential} \left( 2 \right) \\e_{i} &\sim \text{Normal} \left( m_{i}, s_{i} \right)\end{align} $$ {#eq-ei_prior_normal}The left panel of @fig-ei_prior_normalLMM shows the prior anticipates differences in entropy scores between speakers as large $3$ units of entropy. However, the right panel of @fig-ei_prior_normalLMM demonstrate that when transformed to the entropy scale the model anticipates a concentration scores around low levels, but also it expects the differences to go way beyond the feasible range of the outcome. ```{r}#| label: fig-ei_prior_normalLMM#| fig-cap: 'Normal LMM, speaker differences prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_s =rexp(n=n, rate=2 ) # <3>z_s =rexp(n=n, rate=1 )param_hscale = r_s * z_sm_i =rnorm(n=n, mean=0, sd=0.05 ) # <4>s_i =rexp(n=n, rate=2 )param_pscale =rnorm( n=n, mean=m_i, sd=s_i )param_oscale =rnorm( n=n, mean=param_pscale, sd=param_hscale ) # <5>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1.5,1.5), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(e[i]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale#### Within sentence-speaker differences $u_{si}$The prior distribution of $u_{si}$ for the Normal LMM is described in @eq-usi_prior_normal. The same prior is assigned to each sentence within each speakers in the sample. This choice implies that the average entropy score differences among sentences within speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters (refer to @sec-hyperpriors). $$\begin{align}m_{u} &\sim \text{Normal} \left( 0, 0.05 \right) \\s_{u} &\sim \text{Exponential} \left( 2 \right) \\u_{si} &\sim \text{Normal} \left( m_{u}, s_{u} \right) \\\end{align} $$ {#eq-usi_prior_normal}The left panel of @fig-ui_prior_normalLMM shows the prior restricts the average differences in entropy among sentences within speakers can be as large as $3$ units of measurement. Furthermore, the right panel of @fig-ui_prior_normalLMM demonstrate that when transformed to the entropy scale the model anticipates a concentration of scores around mid-levels of entropy. More importantly, the model expects the differences to go beyond the feasible range of the outcome.```{r}#| label: fig-ui_prior_normalLMM#| fig-cap: 'Normal LMM, within sentence-speaker average differences prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_s =rexp(n=n, rate=2 ) # <3>z_s =rexp(n=n, rate=1 )param_hscale = r_s * z_sm_u =rnorm(n=n, mean=0, sd=0.05 ) # <4>s_u =rexp(n=n, rate=2 )param_pscale =rnorm( n=n, mean=m_u, sd=s_u )param_oscale =rnorm( n=n, mean=param_pscale, sd=param_hscale ) # <5>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1.5,1.5), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(u[i]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale#### Random block effect $a_{b}$The prior distribution for the Normal LMM is described in @eq-ab_prior_normal. The same prior is assigned to each block. This choice implies that the average entropy score differences among blocks are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to @sec-hyperpriors). $$\begin{align}m_{b} &\sim \text{Normal} \left( 0, 0.05 \right) \\s_{b} &\sim \text{Exponential} \left( 2 \right) \\a_{b} &\sim \text{Normal} \left( m_{b}, s_{b} \right)\end{align} $$ {#eq-ab_prior_normal}The left panel of @fig-ab_prior_normalLMM shows a prior with no particular bias towards differences between blocks above or below zero units of entropy. Nevertheless, the right panel of @fig-ab_prior_normalLMM demonstrate that when the prior is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy, but also contemplates differences beyond the feasible range of the outcome.```{r}#| label: fig-ab_prior_normalLMM#| fig-cap: 'Normal LMM, block differences prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_s =rexp(n=n, rate=2 ) # <3>z_s =rexp(n=n, rate=1 )param_hscale = r_s * z_sm_b =rnorm(n=n, mean=0, sd=0.05 ) # <4>s_b =rexp(n=n, rate=2 )param_pscale =rnorm( n=n, mean=m_b, sd=s_b )param_oscale =rnorm( n=n, mean=param_pscale, sd=param_hscale ) # <5>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1.5,1.5), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(u[si]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale#### Linear predictor $g(\cdot)$After the careful assessment of the prior implications for each parameter, the expected prior distribution for the linear predcitor can be constructed for the Normal LMM. The prior predictive simulation can be described as in @eq-g_prior: $$\begin{align}m &\sim \text{Normal} \left( 0, 0.05 \right) \\s &\sim \text{Exponential} \left( 2 \right) \\e_{i} &\sim \text{Normal} \left( m, s \right) \\u_{si} &\sim \text{Normal} \left( m, s \right) \\a_{b} &\sim \text{Normal} \left( m, s \right) \\\alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.2 \right) \\\beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\g(\cdot) &= \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{si} + a_{b} \\\end{align} $$ {#eq-g_prior}The left panel of @fig-SI_prior_betaGLLAMM shows the prior expects speakers' potential intelligibility scores to be more probable between $[-2.5, 2.5]$, implying there is particular bias towards negative entropy scores is present jointly in these priors. Furthermore, the right panel of @fig-SI_prior_betaGLLAMM, demonstrate that when transformed to the entropy scale, the model anticipates prediction of entropy scores within its feasible range, but somewhat more probable in the extremes of entropy.```{r}#| label: fig-g_prior_normalLMM#| fig-cap: 'Normal LMM, linear predictor distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_s =rexp(n=n, rate=2 ) # <3>z_s =rexp(n=n, rate=1 )param_hscale = r_s * z_sm =rnorm(n=n, mean=0, sd=0.05 )s =rexp(n=n, rate=2 )e_i =rnorm( n=n, mean=m, sd=s )u_si =rnorm( n=n, mean=m, sd=s )a_b =rnorm( n=n, mean=m, sd=s )aHS =rnorm( n=n, mean=0, sd=0.2 ) # <4>bAHS =rnorm( n=n, mean=0, sd=0.1 ) param_pscale = aHS + bAHS + u_si + e_i + a_bparam_oscale =rnorm( n=n, mean=param_pscale, # <5>sd=param_hscale )par(mfrow=c(1,2))dens( param_pscale, xlim=c(-3,3), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(SI[i]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale### Beta-proportion GLLAMM {#sec-prior_beta_GLLAMM}For the parameters of the Beta-proportion GLLAMM, weakly informative priors and hyperpriors are established (refer to @sec-prior_effects). The specified priors are as follows:#### Sample size $M_{i}$Similar to the Normal LMM, @sec-fitted describes a Beta-proportion GLLAMM that initially considers one $M$ for all speakers in the data. This choice implies that the presumed uncertainty for the unexplained variability of the average entropy at the word-level is the same for all speakers, prior to the observation of empirical data.$$\begin{align}M &\sim \text{Exponential}\left( 0.4 \right)\end{align} $$ {#eq-beta_GLLAM_m_prior1}The left and right panel of @fig-m_prior_betaGLLAMM1, demonstrate the prior of $M$ expects the parameters to be more probable in a positive range between $[0, 7]$, while predicting scores within the boundaries of the data. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.```{r}#| label: fig-m_prior_betaGLLAMM1#| fig-cap: 'Beta-proportion GLLAMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>param_pscale =rexp( n=n, rate=0.4 ) # <3>param_oscale =rbeta2( n=n, prob=0.5, theta=param_pscale ) # <4>par(mfrow=c(1,2))dens( param_pscale, xlim=c(0,10), # <5>show.HPDI=0.95,main='Parameter scale', xlab=expression(M))dens( param_oscale, xlim=c(0,1), # <6>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. parameter scale4. entropy scale5. density plot: restricted parameter scale6. density plot: bounded entropy scaleFurthermore, as described in @sec-beta_GLLAMM and @sec-fitted, there is a possibility that the model considers one $M_{i}$ prior for each speakers in the data. This choice implies the presumed uncertainty for the unexplained dispersion of the average entropy at the word-level is similar for each speaker, prior to the observation of empirical data. In this case the parameters are defined in terms of hyperpriors (refer to @sec-hyperpriors). $$\begin{align}r_{M} &\sim \text{Exponential}\left( 0.2 \right) \\M_{i} &\sim \text{Exponential}\left( r_{M} \right)\end{align} $$ {#eq-beta_GLLAM_m_prior2}The left and right panel of @fig-m_prior_betaGLLAMM2, demonstrate the prior of $M_{i}$ expects the parameters to be more probable in a positive range between $[0, 20]$, while at the same time predicting data within the boundaries of the entropy scores. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.```{r}#| label: fig-m_prior_betaGLLAMM2#| fig-cap: 'Beta-proportion GLLAMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_M =rexp(n=n, rate=0.2 ) # <3>z_M =rexp(n=n, rate=1 )param_pscale = r_M * z_M # <4>param_oscale =rbeta2( n=n, prob=0.5, theta=param_pscale ) # <5>par(mfrow=c(1,2))dens( param_pscale, xlim=c(0,20), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(M[i]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. density plot: unrestricted parameter scale7. density plot: bounded entropy scale<!-- The value we choose for the prior $\theta$ can be thought of this way: It is the number of new flips of the coin that we would need to make us teeter between the new data and the prior belief about $\mu$. If we would only need a few new flips to sway our beliefs, then our prior beliefs should be represented by a small $\theta$. If we would need a large number of new flips to sway us away from our prior beliefs about $\mu$, then our prior beliefs are worth a very large $\theta$ \cite{Kruschke_2015}. -->#### Intercepts $\alpha$Considering that the structural parameters are now interpretable in terms of the (latent) potential intelligibility scores, the general intercept $\alpha$ is used to set the scale of the latent trait, as it is required in latent variable models [@Depaoli_2021] (refer to @sec-prior_effects). The prior distribution for $\alpha$ under the Beta-proportion GLLAMM is described in @eq-alpha_prior_beta. $$\begin{align}\alpha &\sim \text{Normal} \left( 0, 0.05 \right)\end{align} $$ {#eq-alpha_prior_beta}The left panel of @fig-alpha_prior_betaGLLAMM show the prior is narrowly concentrated around zero. Moreover, the right panel of @fig-alpha_prior_betaGLLAMM, demonstrate that when the parameter is transformed to the entropy scale, the model anticipates entropy scores at mid-levels of the feasible range of the outcome. This implies that no particular bias in entropy scores are expected by prior.```{r}#| label: fig-alpha_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, general intercept prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_M =rexp(n=n, rate=0.2 ) # <3>z_M =rexp(n=n, rate=1 )param_hscale = r_M * z_Mparam_pscale =rnorm( n=n, mean=0, sd=0.05 ) # <4> param_oscale =rbeta2( n=n, prob=inv_logit(-1*param_pscale), # <5>theta=param_hscale ) par(mfrow=c(1,2))dens( param_pscale, xlim=c(-0.5,0.5), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(M))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale#### Hearing status effects $\alpha_{HS[i]}$The prior distribution for the Beta-proportion GLLAMM is described in @eq-alphaHS_prior_beta. Notably, the same prior is applied to both two hearing status categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.$$\begin{align}\alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right)\end{align} $$ {#eq-alphaHS_prior_beta}The right panel of @fig-alphaHS_prior_betaGLLAMM, demonstrate that when the $\alpha_{HS[i]}$ prior is transformed to the entropy scale, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards specific entropy score values are expected from the using the prior.```{r}#| label: fig-alphaHS_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, hearing status effects prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_M =rexp(n=n, rate=0.2 ) # <3>z_M =rexp(n=n, rate=1 )param_hscale = r_M * z_Mparam_pscale =rnorm( n=n, mean=0, sd=0.3 ) # <4> param_oscale =rbeta2( n=n, prob=inv_logit(-1*param_pscale), # <5>theta=param_hscale ) par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(alpha[HS]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale#### Chronological age per hearing status $\beta_{A,HS[i]}$The prior distribution for the Beta-proportion GLLAMM is described in @eq-betaHS_prior_normal. Notably, the same prior is applied to both two hearing status categories. This choice implies that the evolution of potential intelligibility attributed to chronological age between the categories is presumed to have similar uncertainties, prior to the observation of empirical data.$$\begin{align}\beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right)\end{align} $$ {#eq-betaHS_prior_normal}The left panel of @fig-betaHS_prior_betaGLLAMM shows the weakly informative prior has no particular bias towards a positive or negative evolution of potential intelligibility due to chronological age per hearing status group. Furthermore, the right panel of @fig-betaHS_prior_betaGLLAMM, demonstrate that when transformed to the entropy scale, the model anticipates a slight concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome.```{r}#| label: fig-betaHS_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, chronological age per hearing status effects prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_M =rexp(n=n, rate=0.2 ) # <3>z_M =rexp(n=n, rate=1 )param_hscale = r_M * z_Mparam_pscale =rnorm( n=n, mean=0, sd=0.1 ) # <4>usd =function(i){ param_pscale * data_H$Am[i] } # <5>param_mscale =sapply( 1:length(data_H$Am), usd ) # <6>param_oscale =rbeta2( n=n, prob=inv_logit(-1*param_pscale), # <7>theta=param_hscale )par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <8>show.HPDI=0.95,main='Parameter scale', xlab=expression(beta[AHS]))dens( param_oscale, xlim=c(0,1), # <9>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. user defined function6. parameter scale7. entropy scale8. unrestricted parameter scale9. bounded entropy scale#### speaker differences $e_{i}$The prior distribution for the Beta-proportion GLLAMM is described in @eq-ei_prior_beta. The same prior is assigned to each speakers in the sample. This choice implies that differences in potential intelligibility differences between speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters, called hyperpriors (refer to @sec-hyperpriors). $$\begin{align}m_{i} &\sim \text{Normal} \left( 0, 0.05 \right) \\s_{i} &\sim \text{Exponential} \left( 2 \right) \\e_{i} &\sim \text{Normal} \left( m_{i}, s_{i} \right)\end{align} $$ {#eq-ei_prior_beta}The left panel of @fig-ei_prior_betaGLLAMM shows the prior anticipates differences in intelligibility between speakers as large $3$ units of measurement. Furthermore, the right panel of @fig-ei_prior_betaGLLAMM, demonstrate that when transformed to the entropy scale, the model anticipates a high concentration around mid-levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility between speakers are expected resulting from using this prior.```{r}#| label: fig-ei_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, speakers differences prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_M =rexp(n=n, rate=0.2 ) # <3>z_M =rexp(n=n, rate=1 )param_hscale = r_M * z_Mm_i =rnorm( n=n, mean=0, sd=0.05 ) # <4>s_i =rexp(n=n, rate=2 )param_pscale =rnorm( n=n, mean=m_i, sd=s_i )param_oscale =rbeta2( n=n, prob=inv_logit(-1*param_pscale), # <5>theta=param_hscale )par(mfrow=c(1,2))dens( param_pscale, xlim=c(-2,2), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(e[i]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale#### Average within sentence-speaker differences $u_{i}$The prior distribution of $u_{i}$ for the Beta-proportion GLLAMM is described in @eq-ui_prior_beta. The same prior is assigned to each sentence within each speakers in the sample. This choice implies that the average potential intelligibility differences among sentences within speakers are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to @sec-hyperpriors). Next, the within sentence-speaker differences are then aggregated to the speaker level to form the sentence random effects $u_{i}$,$$\begin{align}m_{u} &\sim \text{Normal} \left( 0, 0.05 \right) \\s_{u} &\sim \text{Exponential} \left( 2 \right) \\u_{si} &\sim \text{Normal} \left( m_{u}, s_{u} \right) \\u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S}\end{align} $$ {#eq-ui_prior_beta}The left panel of @fig-ui_prior_betaGLLAMM shows the prior restricts the average differences in potential intelligibility among sentences within speakers can be as large as $0.8$ units of measurement. Furthermore, the right panel of @fig-ui_prior_betaGLLAMM, demonstrate that when $u_{i}$ is transformed to the entropy scale, the model anticipates a high concentration of scores around mid-levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility is expected between speakers.```{r}#| label: fig-ui_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, average within sentence-speaker differences prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_M =rexp(n=n, rate=0.2 ) # <3>z_M =rexp(n=n, rate=1 )param_hscale = r_M * z_Mm_u =rnorm( n=n, mean=0, sd=0.05 ) # <4>s_u =rexp(n=n, rate=2 )param_pscale =replicate(n=n, expr=mean( rnorm( n=10, mean=m_u, sd=s_u ) ) ) param_oscale =rbeta2( n=n, prob=inv_logit(-1*param_pscale), # <5>theta=param_hscale )par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(u[i]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale#### Random block effect $a_{b}$The prior distribution for the Beta-proportion GLLAMM is described in @eq-ab_prior_beta. The same prior is assigned to each block. This choice implies that the average entropy scores differences among blocks are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of hyperpriors (refer to @sec-hyperpriors). $$\begin{align}m_{b} &\sim \text{Normal} \left( 0, 0.05 \right) \\s_{b} &\sim \text{Exponential} \left( 2 \right) \\a_{b} &\sim \text{Normal} \left( m_{b}, s_{b} \right)\end{align} $$ {#eq-ab_prior_beta}The left panel of @fig-ab_prior_betaGLLAMM shows a prior with no particular bias towards positive or negative differences between blocks. Furthermore, the right panel of @fig-ab_prior_betaGLLAMM demonstrate that when transformed to the entropy scale, the model anticipates a high concentration of data around mid levels of entropy, but not beyond the feasible range of the outcome.```{r}#| label: fig-ab_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, block differences prior distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_M =rexp(n=n, rate=0.2 ) # <3>z_M =rexp(n=n, rate=1 )param_hscale = r_M * z_Mm_b =rnorm( n=n, mean=0, sd=0.05 ) # <4>s_b =rexp(n=n, rate=2 )param_pscale =rnorm( n=n, mean=m_b, sd=s_b )param_oscale =rbeta2( n=n, prob=inv_logit(param_pscale), # <5>theta=param_hscale )par(mfrow=c(1,2))dens( param_pscale, xlim=c(-2,2), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(u[si]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale#### Speech intelligibility $SI_{i}$After the careful assessment of the prior implications for each parameter, the expected prior distribution for the potential intelligibility can be constructed for the Beta-proportion GLLAMM. The prior predictive simulation can be described as in @eq-SI_prior: $$\begin{align}\alpha &\sim \text{Normal} \left( 0, 0.05 \right) \\\alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right) \\\beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\m &\sim \text{Normal} \left( 0, 0.05 \right) \\s &\sim \text{Exponential} \left( 2 \right) \\e_{i} &\sim \text{Normal} \left( m, s \right) \\u_{si} &\sim \text{Normal} \left( m, s \right) \\u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S} \\a_{b} &\sim \text{Normal} \left( m, s \right) \\SI_{si} &= \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i} \\\end{align} $$ {#eq-SI_prior}The left panel of @fig-SI_prior_betaGLLAMM shows the prior expects speakers' potential intelligibility scores to be more probable between $[-3, 3]$, implying no particular bias towards positive or negative potential intelligibility is present jointly in these priors. Furthermore, the right panel of @fig-SI_prior_betaGLLAMM, demonstrate that when transformed to the entropy scale, the model anticipates prediction of entropy scores within its feasible range, but somewhat more probable in the extremes of entropy.```{r}#| label: fig-SI_prior_betaGLLAMM#| fig-cap: 'Beta GLLAMM, potential intelligibility distribution: parameter and entropy scale'#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>r_M =rexp(n=n, rate=0.2 ) # <3>z_M =rexp(n=n, rate=1 )param_hscale = r_M * z_Mm =rnorm( n=n, mean=0, sd=0.05 )s =rexp(n=n, rate=2 )e_i =rnorm( n=n, mean=m, sd=s )u_i =replicate( n=n, exp=mean( rnorm( n=10, mean=m, sd=s ) ) )a_b =rnorm( n=n, mean=m, sd=s )a =rnorm( n=n, mean=0, sd=0.05 ) # <4>aHS =rnorm( n=n, mean=0, sd=0.3 ) bAHS =rnorm( n=n, mean=0, sd=0.1 ) param_pscale = a + aHS + bAHS + e_i + u_i param_oscale =rbeta2( n=n, prob=inv_logit(a_b-param_pscale), # <5>theta=param_hscale )par(mfrow=c(1,2))dens( param_pscale, xlim=c(-3,3), # <6>show.HPDI=0.95,main='Parameter scale', xlab=expression(SI[i]))dens( param_oscale, xlim=c(0,1), # <7>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. hyperpriors scale4. parameter scale5. entropy scale6. unrestricted parameter scale7. bounded entropy scale## Fitted models {#sec-fitted}This study evaluates the comparative predictive capabilities of both the Normal LMM and the Beta-proportion GLLAMM (RQ1) while simultaneously examining various formulations regarding how speaker-related factors influence intelligibility (RQ3). In this context, the predictive capabilities of the models are intricately connected to these formulations. As a result, the study requires fitting $12$ different models, each representing a specific manner to investigate one or both research questions. The models comprised six versions of both the Normal LMM and the Beta-proportion GLLAMM. The differences among the models hinged on (1) whether they addressed data clustering in conjunction with measurement error, denoted as the model type, (2) the assumed distribution for the entropy scores, which aimed to handle boundedness, (3) whether the model incorporates a robust feature to address mild or moderate departures of the data from distributional assumptions, and (4) the inclusion or exclusion of speaker-related factors in the models. A detailed overview of the fitted models is available in @tbl-fitted. +-------+---------+--------------+---------+----------------+-------------+-------------------+| | Model | Entropy | Robust | Fixed effects | | |+-------+---------+--------------+---------+----------------+-------------+-------------------+| Model | type | distribution | feature | $\beta_{HS[i]}$| $\beta_{A}$ | $\beta_{A,HS[i]}$ |+:=====:+:=======:+:============:+:=======:+:==============:+:===========:+:=================:+| 1 | LMM | Normal | No | No | No | No |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 2 | LMM | Normal | No | Yes | Yes | No |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 3 | LMM | Normal | No | Yes | No | Yes |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 4 | LMM | Normal | Yes | No | No | No |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 5 | LMM | Normal | Yes | Yes | Yes | No |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 6 | LMM | Normal | Yes | Yes | No | Yes |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 7 | GLLAMM | Beta-prop. | No | No | No | No |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 8 | GLLAMM | Beta-prop. | No | Yes | Yes | No |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 9 | GLLAMM | Beta-prop. | No | Yes | No | Yes |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 10 | GLLAMM | Beta-prop. | Yes | No | No | No |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 11 | GLLAMM | Beta-prop. | Yes | Yes | Yes | No |+-------+---------+--------------+---------+----------------+-------------+-------------------+| 12 | GLLAMM | Beta-prop. | Yes | Yes | No | Yes |+-------+---------+--------------+---------+----------------+-------------+-------------------+: Fitted models. {#tbl-fitted .striped .hover}The following tabset panel provides the commentated `Stan` code for all fitted model. Furthermore, the models are implemented using *non-centered priors* (refer to @sec-hyperpriors).::: {.panel-tabset}## model ## 1```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept // vector[cHS] aHS; // HS effects // real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters real<lower=0> s_w; // speaker, utterance, word sd (one overall) //real<lower=0> r_s; // global rate for SD //vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects //vector[I] s_w; // speaker, utterance, word sd (one per speaker) vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE //s_w = z_s * r_s; // non-centered speaker, utterance, word sd // average entropy for(n in 1:N){ mu[n] = a + // aHS[ HS[n] ] + // bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + b_i[ bid[n] ] + e_i[ cid[n] ] + u_si[ cid[n], uid[n] ]; }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); // aHS ~ normal( 0, 0.2 ); // bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors s_w ~ exponential( 2 ); //r_s ~ exponential( 2 ); //z_s ~ exponential( 1 ); // likelihood for(n in 1:N){ Hwsib[n] ~ normal( mu[n] , s_w ); // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"# savingmodel_nam ="model01.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 2```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters //real a; // intercept vector[cHS] aHS; // HS effects real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters real<lower=0> s_w; // speaker, utterance, word sd (one overall) //real<lower=0> r_s; // global rate for SD //vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects //vector[I] s_w; // speaker, utterance, word sd (one per speaker) vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE //s_w = z_s * r_s; // non-centered speaker, utterance, word sd // average entropy for(n in 1:N){ mu[n] = //a + aHS[ HS[n] ] + bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + b_i[ bid[n] ] + e_i[ cid[n] ] + u_si[ cid[n], uid[n] ]; }}model{ // fixed effects priors //a ~ normal( 0 , 0.05 ); aHS ~ normal( 0, 0.2 ); bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors s_w ~ exponential( 2 ); //r_s ~ exponential( 2 ); //z_s ~ exponential( 1 ); // likelihood for(n in 1:N){ Hwsib[n] ~ normal( mu[n] , s_w ); // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"# savingmodel_nam ="model02.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 3```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters //real a; // intercept vector[cHS] aHS; // HS effects // real bAm; // Am effects vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters real<lower=0> s_w; // speaker, utterance, word sd (one overall) //real<lower=0> r_s; // global rate for SD //vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects //vector[I] s_w; // speaker, utterance, word sd (one per speaker) vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE //s_w = z_s * r_s; // non-centered speaker, utterance, word sd // average entropy for(n in 1:N){ mu[n] = //a + aHS[ HS[n] ] + // bAm*Am[n] + bAmHS[ HS[n] ]*Am[n] + b_i[ bid[n] ] + e_i[ cid[n] ] + u_si[ cid[n], uid[n] ]; }}model{ // fixed effects priors //a ~ normal( 0 , 0.05 ); aHS ~ normal( 0, 0.2 ); // bAm ~ normal( 0 , 0.1 ); bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors s_w ~ exponential( 2 ); //r_s ~ exponential( 2 ); //z_s ~ exponential( 1 ); // likelihood for(n in 1:N){ Hwsib[n] ~ normal( mu[n] , s_w ); // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"# savingmodel_nam ="model03.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 4```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept // vector[cHS] aHS; // HS effects // real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters // real<lower=0> s_w; // speaker, utterance, word sd (one overall) real<lower=0> r_s; // global rate for SD vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects vector[I] s_w; // speaker, utterance, word sd (one per speaker) vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE s_w = z_s * r_s; // non-centered speaker, utterance, word sd // average entropy for(n in 1:N){ mu[n] = a + // aHS[ HS[n] ] + // bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + b_i[ bid[n] ] + e_i[ cid[n] ] + u_si[ cid[n], uid[n] ]; }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); // aHS ~ normal( 0, 0.2 ); // bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors //s_w ~ exponential( 2 ); r_s ~ exponential( 2 ); z_s ~ exponential( 1 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ normal( mu[n] , s_w ); Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"# savingmodel_nam ="model04.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 5```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters //real a; // intercept vector[cHS] aHS; // HS effects real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters // real<lower=0> s_w; // speaker, utterance, word sd (one overall) real<lower=0> r_s; // global rate for SD vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects vector[I] s_w; // speaker, utterance, word sd (one per speaker) vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE s_w = z_s * r_s; // non-centered speaker, utterance, word sd // average entropy for(n in 1:N){ mu[n] = //a + aHS[ HS[n] ] + bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + b_i[ bid[n] ] + e_i[ cid[n] ] + u_si[ cid[n], uid[n] ]; }}model{ // fixed effects priors //a ~ normal( 0 , 0.05 ); aHS ~ normal( 0, 0.2 ); bAm ~ normal( 0 , 0.1 ); //bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors //s_w ~ exponential( 2 ); r_s ~ exponential( 2 ); z_s ~ exponential( 1 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ normal( mu[n] , s_w ); Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"# savingmodel_nam ="model05.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 6```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters //real a; // intercept vector[cHS] aHS; // HS effects // real bAm; // Am effects vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters // real<lower=0> s_w; // speaker, utterance, word sd (one overall) real<lower=0> r_s; // global rate for SD vector<lower=0>[I] z_s; // non-centered speaker, utterance, word sd (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects vector[I] s_w; // speaker, utterance, word sd (one per speaker) vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE s_w = z_s * r_s; // non-centered speaker, utterance, word sd // average entropy for(n in 1:N){ mu[n] = //a + aHS[ HS[n] ] + // bAm*Am[n] + bAmHS[ HS[n] ]*Am[n] + b_i[ bid[n] ] + e_i[ cid[n] ] + u_si[ cid[n], uid[n] ]; }}model{ // fixed effects priors //a ~ normal( 0 , 0.05 ); aHS ~ normal( 0, 0.2 ); // bAm ~ normal( 0 , 0.1 ); bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors //s_w ~ exponential( 2 ); r_s ~ exponential( 2 ); z_s ~ exponential( 1 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ normal( mu[n] , s_w ); Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"# savingmodel_nam ="model06.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 7```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept // vector[cHS] aHS; // HS effects // real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters real<lower=0> Mw; // 'sample size' parameter //real<lower=0> r_M; // global rate for 'sample size' //vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects //vector[I] Mw; // non-centered 'sample size' (one per speaker) vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE //Mw = z_M * r_M; // non-centered 'sample size' // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + // aHS[ HS[n] ] + // bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); // aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors Mw ~ exponential( 0.4 ); //r_M ~ exponential( 0.2 ); //z_M ~ exponential( 1 ); // likelihood for(n in 1:N){ Hwsib[n] ~ beta_proportion( mu[n] , Mw ); // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"# savingmodel_nam ="model07.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 8```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters real<lower=0> Mw; // 'sample size' parameter //real<lower=0> r_M; // global rate for 'sample size' //vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects //vector[I] Mw; // non-centered 'sample size' (one per speaker) vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE //Mw = z_M * r_M; // non-centered 'sample size' // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors Mw ~ exponential( 0.4 ); //r_M ~ exponential( 0.2 ); //z_M ~ exponential( 1 ); // likelihood for(n in 1:N){ Hwsib[n] ~ beta_proportion( mu[n] , Mw ); // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"# savingmodel_nam ="model08.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 9```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects // real bAm; // Am effects vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters real<lower=0> Mw; // 'sample size' parameter //real<lower=0> r_M; // global rate for 'sample size' //vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects //vector[I] Mw; // non-centered 'sample size' (one per speaker) vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE //Mw = z_M * r_M; // non-centered 'sample size' // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + // bAm*Am[n] + bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors Mw ~ exponential( 0.4 ); //r_M ~ exponential( 0.2 ); //z_M ~ exponential( 1 ); // likelihood for(n in 1:N){ Hwsib[n] ~ beta_proportion( mu[n] , Mw ); // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"# savingmodel_nam ="model09.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 10```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept // vector[cHS] aHS; // HS effects // real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters // real<lower=0> Mw; // 'sample size' parameter real<lower=0> r_M; // global rate for 'sample size' vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] Mw; // non-centered 'sample size' (one per speaker) vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE Mw = z_M * r_M; // non-centered 'sample size' // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + // aHS[ HS[n] ] + // bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); // aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors //Mw ~ exponential( 0.4 ); r_M ~ exponential( 0.2 ); z_M ~ exponential( 1 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ beta_proportion( mu[n] , Mw ); Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"# savingmodel_nam ="model10.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 11```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters // real<lower=0> Mw; // 'sample size' parameter real<lower=0> r_M; // global rate for 'sample size' vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] Mw; // non-centered 'sample size' (one per speaker) vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE Mw = z_M * r_M; // non-centered 'sample size' // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors //Mw ~ exponential( 0.4 ); r_M ~ exponential( 0.2 ); z_M ~ exponential( 1 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ beta_proportion( mu[n] , Mw ); Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"# savingmodel_nam ="model11.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```## 12```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (speakers) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // speaker's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects // real bAm; // Am effects vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // speaker RE mean real<lower=0> s_i; // speaker RE sd vector[I] z_i; // non-centered speaker RE real m_u; // speaker, utterance RE mean real<lower=0> s_u; // speaker, utterance RE sd matrix[I,U] z_u; // non-centered speaker, utterance RE // variability parameters // real<lower=0> Mw; // 'sample size' parameter real<lower=0> r_M; // global rate for 'sample size' vector<lower=0>[I] z_M; // non-centered 'sample size' (one per speaker)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // speaker random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] Mw; // non-centered 'sample size' (one per speaker) vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered speaker RE u_si = m_u + s_u*z_u; // non-centered utterance RE Mw = z_M * r_M; // non-centered 'sample size' // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + // bAm*Am[n] + bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.3 ); bAmHS ~ normal( 0 , 0.3 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors //Mw ~ exponential( 0.4 ); r_M ~ exponential( 0.2 ); z_M ~ exponential( 1 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ beta_proportion( mu[n] , Mw ); Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"# savingmodel_nam ="model12.stan"writeLines(mcmc_code, con=file.path(getwd(), 'real_models', model_nam) )```:::Moreover, the following code is provided so the reader can fit all `Stan` models.::: {.panel-tabset}## code ## fitting```{r}for(i in1:12){ model_nam =paste0( ifelse(i<10, 'model0', 'model'), i, '.stan') model_in =file.path(getwd(), 'real_models') model_out =file.path(getwd(), 'real_chain') mod =cmdstan_model( file.path(model_in, model_nam) )print(model_nam) mod$sample( data=dlist,output_dir=model_out,output_basename =str_replace(model_nam, '.stan', ''),num_warmup=2000, num_samples=2000,chains=4, parallel_chains=4,max_treedepth=20, adapt_delta=0.95) #,init=0}```:::## Estimation {#sec-estimation}The models were estimated using `R` version 4.2.2 [@R_2015] and `Stan` version 2.26.1 [@Stan_2020]. Four Markov chains were implemented for each parameter, each with distinct starting values. Each chain underwent $4,000$ iterations, where the first $2,000$ served as a warm-up phase and the remaining $2,000$ were considered samples from the posterior distribution. ## Chain quality and information {#sec-Bquality}Verification of stationarity, convergence, and mixing for the parameter chains involved graphical analysis and diagnostic statistics. Graphical analysis utilized trace, trace-rank, and autocorrelation plots (ACF). Diagnostic statistics included the *potential scale reduction factor statistics* $\widehat{\text{R}}$ with a cut-off value of $1.05$ [@Vehtari_et_al_2021]. Furthermore, to confirm whether the parameters posterior distributions were generated with a sufficient number of uncorrelated sampling points, each posterior distribution density plot was inspected along with their effective sample size statistics $n_{\text{eff}}$ [@Gelman_et_al_2014]. ## Model comparison<!-- This research uses the *Information-Theoretic Approach (ITA)* [@Anderson_2008; @Chamberlain_1965] for model comparison and inference. The ITA comprises three steps: (1) the expression of the research hypothesis into statistical models, (2) the comparison of the most plausible models, and (3) the production of inferences based on one or multiple selected models. --><!-- ::: {.column-margin} --><!-- **_Information-Theoretic Approach_** --><!-- Approach to model selection and inference composed of three steps: --><!-- 1. Express research hypothesis into models, --><!-- 2. Select most plausible models, --><!-- 3. Produce inferences based on selected models --><!-- ::: -->The study compares the fitted models using three criteria: the *deviance information criterion (DIC)* by Spiegelhalter et al. [@Spiegelhalter_et_al_2002], the *widely applicable information criterion (WAIC)* by Watanabe [-@Watanabe_2013], and the *Pareto Smoothing Importance Sampling criterion (PSIS)* by Vehtari et al. [-@Vehtari_et_al_2017]. These criteria score models in terms of deviations from *perfect* predictive accuracy, with smaller values indicating less deviation [@McElreath_2020]. Specifically, DIC measures in-sample deviations, while WAIC and PSIS offer an approximate measure of out-of-sample deviations. Deviations from perfect predictive accuracy serve as the closest estimate for the Kullback-Leibler divergence [@Kullback_et_al_1951], which measures the degree to which a model accurately represents the *true* distribution of the data. Moreover, WAIC and PSIS are considered full Bayesian criteria as they incorporate all the information encompassed in the parameter’s posterior distribution. This effectively integrates and reports the inherent uncertainty in the predictive accuracy estimates. Predictive accuracy aside, PSIS offers an additional advantage in identifying highly influential data points. To achieve this, the criterion uses a built-in warning system that flags observations that make out-of-sample predictions unreliable. The key intuition is that observations that are relatively unlikely, according to the model, exert more influence and render predictions more unreliable than those relatively expected [@McElreath_2020].# Results {#sec-results}This section presents the results of the Bayesian inference procedures, with particular emphasis in answering the three research questions.The posterior estimates of the models are loaded in the following manner. `file_id()` is a user-defined function that identifies the stanfit generated files within a particular directory.::: {.panel-tabset}## code## loading```{r}# load reference modelsfor(i in1:12){ model_nam =paste0( ifelse(i<10, 'model0', 'model'), i) model_out =file.path( save_dir, 'real_chain') model_fit =file_id( model_out, model_nam )assign( model_nam, rstan::read_stan_csv( file.path( model_out, model_fit ) ) )}```:::```{r}#| label: code-models_load#| fig-cap: ''#| echo: false# load reference modelsfor(i in1:12){ model_nam =paste0( ifelse(i<10, 'model0', 'model'), i) model_out =file.path( main_dir, 'real_chain') model_fit =file_id( model_out, model_nam )assign( model_nam, rstan::read_stan_csv( file.path( model_out, model_fit ) ) )}```## Predictive capabilities of the Beta-proportion GLLAMM compared to the Normal LMM (RQ1) {#sec-results_RQ1}This research question evaluates the effectiveness of the Beta-proportion GLLAMM in handling the features of entropy scores by comparing its predictive accuracy to the Normal LMM. Models $1$, $4$, $7$, and $10$ are specifically chosen for this comparison because their assumptions exclusively address the features of the scores, without integrating additional covariate information. As detailed in @tbl-fitted, Model $1$ is a Normal LMM that solely addresses data clustering. Building upon this, Model $4$ introduces a robust feature. Conversely, Model $7$ is a Beta-proportion GLLAMM that deals with boundedness, measurement error and data clustering, and Model $10$ extends this model by incorporating a robust feature. @fig-RQ1_WAIC.PSIS displays values for the `DIC`, `WAIC`, and `PSIS`. They also include the components `dWAIC` and `dPSIS`, highlighting the differences in out-of-sample deviations from the best-fitting model and its associated uncertainty. The associated tables provide similar information, while also reporting the `pWAIC` and `pPSIS` values, indicating the penalization received by the models for their complexity (roughly associated with their number of parameters). Lastly, the tables show the `weight` of evidence, which summarizes the relative support for each model.Overall, all criteria consistently point to Model $10$ as the most plausible choice for the data. The model exhibits the lowest values for both `WAIC` and `PSIS`, establishing itself as the model with the least deviation from *perfect* predictive accuracy among those under comparison. Additionally, @fig-RQ1_WAIC.PSIS visually demonstrates the non-overlapping uncertainty (horizontal blue lines) in both `dWAIC` and `dPSIS` values for Models $1$, $4$, and $7$ when compared to Model $10$. This indicates that Model $10$ significantly deviates the least from *perfect* predictive accuracy when compared to the rest of the models. Lastly, the `weight` of evidence in the tables underscores that $100\%$ of the evidence aligns with and supports Model $10$.```{r}#| label: code-RQ1_WAIC#| fig-cap: ''require(rethinking) # <1>set.seed(12345) # <2>RQ1_WAIC =compare( func=WAIC, # <3> model01, model04, model07, model10 )RQ1_WAIC =cbind( DIC=with(RQ1_WAIC, WAIC-2*pWAIC), # <4> RQ1_WAIC )round( RQ1_WAIC, 3)```1. package requirement2. seed for replication3. comparison of selected models with WAIC4. DIC calculation```{r}#| label: code-RQ1_PSIS#| fig-cap: ''require(rethinking) # <1>set.seed(12345) # <2>RQ1_PSIS =compare( func=PSIS, # <3> model01, model04, model07, model10 )RQ1_PSIS =cbind( DIC=with(RQ1_PSIS, PSIS-2*pPSIS), # <4> RQ1_PSIS )round( RQ1_PSIS, 3)```1. package requirement2. seed for replication3. comparison of selected models with PSIS4. DIC calculation```{r}#| label: fig-RQ1_WAIC.PSIS#| fig-cap: 'WAIC and PSIS model comparison plot. Note: Black and blue points describe point estimates, and continuous horizontal lines indicate the associated uncertainty.'#| fig-height: 7#| fig-width: 6par(mfrow=c(2,1))plot_compare( compare_obj=RQ1_WAIC, # <1>ns=1, m='WAIC', dm=T )plot_compare( compare_obj=RQ1_PSIS, ns=1, m='PSIS', dm=T )par(mfrow=c(1,1))```1. user defined function: plot of Deviance, WAIC, PSIS, and dWAIC with confidence intervalsUpon closer examination, the reasons behind the observed disparities in the models become more apparent. Specifically, @fig-RQ1_pred_speaker highlights that the Normal LMM, as outlined in Model $4$, fails to capture the underlying data patterns, resulting in predictions that are physically inconsistent, falling outside the outcome's range between zero and one. Further insight into this issue is provided by @fig-RQ1_pred_speaker_model04 and @fig-RQ1_model_outliers. @fig-RQ1_pred_speaker_model04 displays Model $4$'s score prediction densities which bear no resemblance to the actual data densities. Furthermore, the top two panels in @fig-RQ1_model_outliers reveal that misspecification in the Normal LMM causes the model to be *more surprised* by 'extreme' entropy scores, leading to their identification as highly unlikely and influential observations. Consequently, the model is rendered unreliable due to the potential biases present in the parameter estimates. In contrast, the Beta-proportion GLLAMM appears to effectively capture the data patterns, generating predictions within the expected data range. This is evident in @fig-RQ1_pred_speaker and complemented by @fig-RQ1_pred_speaker_model10 and @fig-RQ1_model_outliers. In @fig-RQ1_pred_speaker_model10, Model $10$ display prediction densities that bear more resemblance to the actual data densities. Furthermore, the bottom two panels in @fig-RQ1_model_outliers show the model is *less surprised* by 'extreme' scores, fostering more trust in the model's estimates.```{r}#| label: fig-RQ1_pred_speaker#| fig-cap: 'Entropy scores prediction for selected models. Note: Black dots show manifest entropy scores, orange dots and vertical lines show the point estimates and 95% highest probability density interval (HPDI) derived from Model 4, blue dots and vertical lines show similar information for Model 10.'#| fig-height: 6#| fig-width: 10plot_speaker(d=data_H, # <1>stanfit_obj1=model04,stanfit_obj2=model10,p=0.95,decreasing=F,leg=c('model 04','model 10'))```1. user defined function: plot entropy scores and predictions for selected models```{r}#| label: fig-RQ1_pred_speaker_model04#| fig-cap: 'Model 4: Entropy scores density for selected speakers. Note: Black bars denote the true data density, orange bars describe the predicted data density'#| fig-height: 7#| fig-width: 10col_string =rep( rethink_palette[2], 2)pred_speaker_pairs(speakers=c(20,8,11, 25,30,6), # <1>d=data_H, stanfit_obj=model04,p=0.95, nbins=20, col_string=col_string)```1. user defined function: entropy and predicted scores density plot for selected model```{r}#| label: fig-RQ1_pred_speaker_model10#| fig-cap: 'Model 10: Entropy scores density for selected speakers. Note: Black bars denote the true data density, blue bars describe the predicted data density'#| fig-height: 7#| fig-width: 10col_string =rep( rethink_palette[1], 2)pred_speaker_pairs(speakers=c(20,8,11, 25,30,6), # <1>d=data_H, stanfit_obj=model10,p=0.95, nbins=20, col_string=col_string)```1. user defined function: entropy and predicted scores density plot for selected model```{r}#| label: fig-RQ1_model_outliers#| fig-cap: 'Outlier identification and analysis for selected models. Note: Thin and thick vertical discontinuous line indicate threshold of 0.5 and 0.7, respectively. Number pair texts indicate the observation pair of speaker and sentence index. '#| fig-height: 7#| fig-width: 10par(mfrow=c(2,2))plot_outlier(d=data_H, stanfit_obj=model01) # <1>plot_outlier(d=data_H, stanfit_obj=model04)plot_outlier(d=data_H, stanfit_obj=model07)plot_outlier(d=data_H, stanfit_obj=model10)par(mfrow=c(1,1))```1. user defined function: outliers identification for selected model## Estimation of speakers' latent potential intelligibility from manifest entropy scores (RQ2) {#sec-results_RQ2}The second research question aimed to demonstrate the application of the Beta-proportion GLLAMM in estimating the latent potential intelligibility of speakers. This was achieved by employing the general mathematical formalism outlined in @eq-beta_GLLAMM_structural, along with additional specifications provided in @tbl-fitted. The Bayesian procedure successfully estimated the latent potential intelligibility of speakers under Model $10$ through the structural equation:$$\begin{align}SI_{i} = \alpha + e_{i} + u_{i}\end{align} $$ {#eq-beta_GLLAMM_structural_model10}Moreover, due to its implementation under Bayesian procedures, Model $10$ provides the complete posterior distribution of the speakers' potential intelligibility scores. This provision, in turn, (1) enables the calculation of summaries, facilitating the ranking of individuals, and (2) supports the assessment of differences among selected speakers. In both cases, the model considers the inherent uncertainty of the estimates resulting from its measurement using multiple entropy scores.@fig-RQ2_SImodel10 and the associated table display the ranking of speakers in decreasing order based on point estimates of the latent potential intelligibility. These estimates are accompanied by their associated $95\%$ highest probability density intervals (HPDI). Both the table and figure clearly indicate that speaker $6$ stands out as the least intelligible in the sample, followed farther behind by speaker $1$, $17$ and $9$. In contrast, the figure highlights speaker $20$ as the most intelligible, closely followed by speakers $23$, $31$ and $3$. Conversely, @fig-SI_contr_model10 and its associated table show summaries and the full posterior distribution for the comparison of potential intelligibility among selected speakers. The table and figure reveal that only the differences between speakers $6$, $1$, $17$, and $9$, along with the difference between speakers $20$ and $3$ are statistically significant, as their associated $95\%$ HPDI did not overlap with zero (shaded area).```{r}#| label: code-RQ2_SImodel10#| fig-cap: ''SI =pred_SI(d=data_H, stanfit_obj=model10, p=0.95) # <1>SI = SI[order(SI$mean, decreasing=T), ]SI[,c(1:5,9:10)]```1. user-defined function: retrieves SI scores for selected models```{r}#| label: fig-RQ2_SImodel10#| fig-cap: 'Model 10, latent potential intelligibility of speakers. Note: Black dots and vertical lines show mean point estimates and 95% HPDI intervals.'#| fig-height: 4#| fig-width: 10plot_SI(d=data_H, stanfit_obj=model10, # <1>p=0.95, decreasing=T)```1. user defined function: plot ordered potential intelligibility score for speakers```{r}#| label: code-SI_contr_model10#| fig-cap: ''SI_contr =contrast_SI(d=data_H, stanfit_obj=model10, # <1> speakers=c(6,20), p=0.95, raw=T)idx_comp =c(1,21,13,52,60,6)SI_contr$SI_contr[idx_comp,c(1,5:6)]```1. user defined function: produce potential intelligibility contrast among selected speakers```{r}#| label: fig-SI_contr_model10#| fig-cap: 'Model 10, potential intelligibility comparisons among selected speakers. Note: Shaded area describes the 95% highest probability density interval (HPDI)'#| fig-height: 7#| fig-width: 10require(rethinking)par(mfrow=c(2,3))for(i in idx_comp){dens( SI_contr$SI_raw[[i]], xlim=c(-2.5,2.5), # <1>col=rgb(0,0,0,0.7), show.HPDI=0.95,xlab='Difference in potential intelligibility')abline( v=0, lty=2, col=rgb(0,0,0,0.3))mtext( text=names(SI_contr$SI_raw)[i], side=3, adj=0, cex=1.1)}par(mfrow=c(1,1))```1. density plot for the differences in potential intelligibility between selected speakers## Testing the influence of speaker-related factors on intelligibility (RQ3) {#sec-results_RQ3}This research question illustrates how theories on intelligibility can be examined within the model's framework. Specifically, the focus centers on assessing the influence of speaker-related factors on intelligibility, such as chronological age and hearing status. Notably, despite RQ1 indicating the suitability of Beta-proportion GLLAMM models for entropy scores, existing statistical literature suggests that, in certain scenarios, models incorporating covariate adjustment exhibit robustness to misspecification in the functional form linking an outcome and covariates, commonly referred to as covariate-outcome relationship [@Tackney_et_al_2023]. Consequently, this study compares all models detailed in @tbl-fitted. These models are characterized by different covariate adjustments on entropy scores or the latent potential intelligibility of speakers, namely chronological age and hearing status, while potentially exhibiting misspecification in the covariate-outcome relationship, as observed in the case of the Normal LMM. Similar to RQ1, all criteria consistently identify the Beta-proportion GLLAMM outlined in models $11$, $12$ and $10$ as the most plausible models for the data. The models exhibit the lowest values for both `WAIC` and `PSIS`, establishing them as the least deviating models among those under comparison. Moreover, @fig-RQ3_WAIC.PSIS depicts with horizontal blue lines the non-overlapping uncertainty for the models' `dWAIC` and `dPSIS` values. This reveals that, when compared to Model $11$, most models exhibit significantly distinct predictive capabilities. Models $12$ and $10$, however, stand out as exceptions to this pattern. This observation suggests that Models $11$, $12$, and $10$ display the least deviation from *perfect* predictive accuracy in contrast to the other models. Lastly, the `weight` of evidence in the tables, underscores that Model $11$ accumulated the greatest support, followed by Model $12$, and lastly, by Model $10$.```{r}#| label: code-RQ3_WAIC#| fig-cap: ''require(rethinking) # <1>set.seed(12345) # <2>RQ3_WAIC =compare( func=WAIC, # <3> model01, model02, model03, model04, model05, model06, model07, model08, model09, model10, model11, model12 )RQ3_WAIC =cbind( DIC=with(RQ3_WAIC, WAIC-2*pWAIC), # <4> RQ3_WAIC )round( RQ3_WAIC, 3)```1. package requirement2. seed for replication3. comparison of selected models with WAIC4. DIC calculation```{r}#| label: code-RQ3_PSIS#| fig-cap: ''require(rethinking) # <1>set.seed(12345) # <2>RQ3_PSIS =compare( func=PSIS, # <3> model01, model02, model03, model04, model05, model06, model07, model08, model09, model10, model11, model12 )RQ3_PSIS =cbind( DIC=with(RQ3_PSIS, PSIS-2*pPSIS), # <4> RQ3_PSIS )round( RQ3_PSIS, 3)```1. package requirement2. seed for replication3. comparison of selected models with PSIS4. DIC calculation```{r}#| label: fig-RQ3_WAIC.PSIS#| fig-cap: 'WAIC and PSIS model comparison plot. Note: Black and blue points describe point estimates, and continuous horizontal lines indicate the associated uncertainty.'#| fig-height: 12#| fig-width: 6par(mfrow=c(2,1))plot_compare( compare_obj=RQ3_WAIC, # <1>ns=1, m='WAIC', dm=T ) plot_compare( compare_obj=RQ3_PSIS, ns=1, m='PSIS', dm=T )par(mfrow=c(1,1))```1. user defined function: plot of Deviance, WAIC, PSIS, and dWAIC with confidence intervalsA closer examination of two models within this comparison set reveal the reasons behind the largest observed disparities. The Normal LMM, as outlined in Model $6$, continues to face challenges in capturing underlying data patterns, resulting in predictions that are physically inconsistent, falling outside the outcome's range. Additionally, the model persists in identifying highly unlikely and influential observations, making it inherently unreliable. In contrast, the Beta-proportion GLLAMM described by Model $12$ appears to be less susceptible to 'extreme' scores, effectively capturing data patterns within the expected data range and thereby instilling greater confidence in the reliability of the model's estimates. This contrast is visually depicted in @fig-RQ3_pred_speaker, @fig-RQ3_pred_speaker_model06, @fig-RQ3_pred_speaker_model12, and @fig-RQ3_model_outliers.```{r}#| label: fig-RQ3_pred_speaker#| fig-cap: 'Entropy scores prediction for selected models. Note: Black dots show manifest entropy scores, orange dots and vertical lines show the point estimates and 95% highest probability density intervals (HPDI) derived from model 6, blue dots and vertical lines show similar information for model 12.'#| fig-height: 6#| fig-width: 10plot_speaker(d=data_H,stanfit_obj1=model06,stanfit_obj2=model12,p=0.95,decreasing=F,leg=c('model 06','model 12'))```1. user defined function: plot entropy scores and two selected models```{r}#| label: fig-RQ3_pred_speaker_model06#| fig-cap: 'Model 6: Entropy scores density for selected speakers. Note: Black bars denote the true data density, orange bars describe the predicted data density'#| fig-height: 7#| fig-width: 10col_string =rep( rethink_palette[2], 2)pred_speaker_pairs(speakers=c(20,8,11, 25,30,6), # <1>d=data_H, stanfit_obj=model06,p=0.95, nbins=20, col_string=col_string)```1. user defined function: entropy and predicted scores density plot for selected model```{r}#| label: fig-RQ3_pred_speaker_model12#| fig-cap: 'Model 12: Entropy scores density for selected speakers. Note: Black bars denote the true data density, blue bars describe the predicted data density'#| fig-height: 7#| fig-width: 10col_string =rep( rethink_palette[1], 2)pred_speaker_pairs(speakers=c(20,8,11, 25,30,6), # <1>d=data_H, stanfit_obj=model12,p=0.95, nbins=20, col_string=col_string)```1. user defined function: entropy and predicted scores density plot for selected model```{r}#| label: fig-RQ3_model_outliers#| fig-cap: 'Outlier identification and analysis for selected models. Note: Thin and thick vertical discontinuous line indicate threshold of 0.5 and 0.7, respectively. Number pair texts indicate the observation pair of speaker and sentence index.'#| fig-height: 7#| fig-width: 10par(mfrow=c(2,2))plot_outlier(d=data_H, stanfit_obj=model05) # <1>plot_outlier(d=data_H, stanfit_obj=model06)plot_outlier(d=data_H, stanfit_obj=model11)plot_outlier(d=data_H, stanfit_obj=model12)par(mfrow=c(1,1))```1. user defined function: outliers identification for selected modelConsidering the results in @fig-RQ3_WAIC.PSIS, the model comparisons favor three distinct models: Model $10$, $11$ and $12$. Model $10$, supported by $20.4\%$ of the evidence, estimates a single intercept $\alpha$ and no slope to explain the potential intelligibility of speakers (refer to associated table). In contrast, supported by $45.1\%$ of the evidence, Model $11$ estimates distinct intercepts for each hearing status group, namely $\alpha_{HS[1]}$ for `NH` speakers and $\alpha_{HS[2]}$ for the `HI/CI` counterparts, while maintaining a single slope that gauges the impact of age on potential intelligibility estimates. The $95\%$ HPDI for the comparison of intercepts $\alpha_{HS[2]}-\alpha_{HS[1]}$ reveal significant differences between `NH` and `HI/CI` speakers. Lastly, with evidence of $34.1\%$, Model $12$ estimates one intercept and slope per hearing status group, namely $\alpha_{HS[1]}$ and $\beta_{A,HS[1]}$ for the `NH` speakers, and $\alpha_{HS[2]}$ and $\beta_{A,HS[2]}$ for the `HI/CI` counterparts. The $95\%$ HPDI for the comparison of intercepts and slopes reveal significant differences solely in the slopes between `NH` and their `HI/CI` counterparts ($\beta_{A,HS[2]}-\beta_{A,HS[1]}$).However, a discerning reader can notice that these models yield conflicting conclusions regarding the influence of chronological age and hearing status on intelligibility. Model $10$ implies no influence of chronological age and hearing status on the potential intelligibility of speakers. A visual inspection of @fig-RQ3_intelligibility_model10, however, reveals the reason for the model's low support. Model $10$ fails to capture the prevalent increasing age pattern observed in potential intelligibility estimates. In contrast, Model $11$ identifies significant differences in potential intelligibility between `NH` and `HI/CI` speakers. The model further suggests that with the progression of chronological age, `HI/CI` speakers lag behind in intelligibility development, with no opportunity to catch up to their `NH` counterparts within the analyzed age range, as depicted in @fig-RQ3_intelligibility_model11. Finally, Model $12$ indicates no significant differences in intelligibility between `NH` and `HI/CI` speakers at $68$ months of age (around $6$ years old). However, the model reveals distinct evolution patterns of intelligibility per unit of chronological age between different hearing status groups, with `HI/CI` speakers displaying a slower rate of development compared to their `NH` counterparts within the analyzed age range. The latter is evident in @fig-RQ3_intelligibility_model12.```{r}#| label: code-parameter_model10#| fig-cap: ''print('Model 10')par_int =c('a','aHS[1]','aHS[2]','bAm','bAmHS[1]','bAmHS[2]') # <1>model_par =par_recovery(stanfit_obj=model10, # <2>p=0.95, est_par=par_int)model_par[,c(1,5:6)]```1. parameters of interest2. user defined function: recovers the hearing status and chronological age parameter estimates for selected model```{r}#| label: fig-RQ3_intelligibility_model10#| fig-cap: 'Model 10, Potential intelligibility per chronological age and hearing status. Note: Colored dots denote mean point estimates, vertical lines describe the 95% highest probability density intervals (HPDI), thick discontinuous line indicate the regression line, thin continuous lines denote regression lines samples from the posterior distribution, and numbers indicate the speaker index.'#| fig-height: 5#| fig-width: 10pred_intel(d=data_H, stanfit_obj=model10, # <1>p=0.95, ns=500, seed=12345)```1. user defined function: plot potential intelligibility per age and hearing status for selected model```{r}#| label: code-parameter_model11#| fig-cap: ''print('Model 11')par_int =c('a','aHS[1]','aHS[2]','bAm','bAmHS[1]','bAmHS[2]') # <1>model_par =par_recovery(stanfit_obj=model11, # <2>p=0.95, est_par=par_int)model_par[,c(1,5:6)]```1. parameters of interest2. user defined function: recovers the hearing status and chronological age parameter estimates for selected model```{r}#| label: code-contrast_model11#| fig-cap: ''contrs =contrast_intel(stanfit_obj=model11, p=0.95, # <1>rope=c(-0.05,0.05))contrs[,c(1,5:6)]```1. user defined function: extracts parameters of interest from selected model```{r}#| label: fig-RQ3_intelligibility_model11#| fig-cap: 'Model 11, Potential intelligibility per chronological age and hearing status. Note: Colored dots denote mean point estimates, vertical lines describe the 95% highest probability density intervals (HPDI), thick discontinuous line indicate the regression line, thin continuous lines denote regression lines samples from the posterior distribution, and numbers indicate the speaker index.'#| fig-height: 5#| fig-width: 10pred_intel(d=data_H, stanfit_obj=model11, # <1>p=0.95, ns=500, seed=12345)```1. user defined function: plot potential intelligibility per age and hearing status for selected model```{r}#| label: code-parameter_model12#| fig-cap: ''print('Model 12')par_int =c('a','aHS[1]','aHS[2]','bAm','bAmHS[1]','bAmHS[2]') # <1>model_par =par_recovery(stanfit_obj=model12, # <1>p=0.95, est_par=par_int) model_par[,c(1,5:6)]```1. user defined function: extracts parameters of interest from selected model```{r}#| label: code-contrast_model12#| fig-cap: ''contrs =contrast_intel(stanfit_obj=model12, p=0.95, # <1>rope=c(-0.05,0.05))contrs[,c(1,5:6)]```1. user defined function: extracts parameters of interest from selected model```{r}#| label: fig-RQ3_intelligibility_model12#| fig-cap: 'Model 12, Potential intelligibility per chronological age and hearing status. Note: Colored dots denote mean point estimates, vertical lines describe the 95% highest probability density intervals (HPDI), thick discontinuous line indicate the regression line, thin continuous lines denote regression lines samples from the posterior distribution, and numbers indicate the speaker index.'#| fig-height: 5#| fig-width: 10pred_intel(d=data_H, stanfit_obj=model12, # <1>p=0.95, ns=500, seed=12345)```1. user defined function: plot potential intelligibility per age and hearing status for selected model## Chain quality and information {#sec-results_Bquality}Given the considerable number of fitted models and the resulting abundance of parameters, this section opted to exclusively showcase the *quality* and *information* embedded in the Bayesian chains through models $6$ and $12$. The selection of these models is grounded in their parameter counts, with both registering the highest among those detailed in @sec-fitted. It is crucial to underscore that a meticulous examination of all fitted models was conducted. Notably, all models demonstrated comparable results to those specifically chosen for illustrative purposes.In general, both graphical analysis and diagnostic statistics indicated that all chains exhibited low to moderate autocorrelation, explored the parameter space in a seemingly random manner, and converged to a constant mean and variance in their post-warm-up phase. @fig-Rhats visualizes the $\widehat{\text{R}}$ diagnostic statistic and @fig-stationarity_plot1 through @fig-stationarity_plot8 illustrate the chain's graphical analysis.```{r}#| label: code-recovery_modelspar_int =c( 'a','aHS[1]','aHS[2]', # <1>'bAm','bAmHS[1]','bAmHS[2]', 'm_b','s_b','m_i','s_i','m_u','s_u','r_s','r_M','s_w','Mw')model06_parameters =par_recovery( # <2>stanfit_obj = model06,est_par = par_int,p =0.95 )model12_parameters =par_recovery( # <3>stanfit_obj = model12,est_par = par_int,p =0.95 )```1. parameters of interest2. user-defined function: displays concise parameter estimate information for selected model3. user-defined function: displays concise parameter estimate information for selected model```{r}#| label: fig-Rhats#| fig-cap: 'Selected models, Rhat values'#| fig-height: 5#| fig-width: 10par( mfrow=c(1,2) )plot( 1:nrow(model06_parameters), model06_parameters$Rhat4, # <1>ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),xaxt='n',xlab='', ylab='Rhat',main='Normal LMM: model 06')axis( side=1, at=1:nrow(model06_parameters), # <2>labels=rownames(model06_parameters),cex.axis=0.8, las=2 )abline( h=1.05, lty=2, col=rgb(0,0,0,0.3) ) # <3>plot( 1:nrow(model12_parameters), model12_parameters$Rhat4, # <4>ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),xaxt='n',xlab='', ylab='Rhat',main='Beta-proportion GLLAMM: model 12')axis( side=1, at=1:nrow(model12_parameters), # <5>labels=rownames(model12_parameters),cex.axis=0.8, las=2 )abline( h=1.05, lty=2, col=rgb(0,0,0,0.3) ) # <6>par( mfrow=c(1,1) )```1. model 06: Rhat values plot2. model 06: parameters names in x-axis3. model 06: convergence threshold4. model 12: Rhat values plot5. model 12: parameters names in x-axis6. model 12: convergence threshold```{r}#| label: fig-stationarity_plot1#| fig-cap: 'Model 06, trace, trace rank and ACF plots for selected parameters'#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model06, # <1>pars=c('aHS[1]','aHS[2]') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot1.2#| fig-cap: 'Model 12, trace, trace rank and ACF plots for selected parameters'#| fig-height: 6#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=c('a','aHS[1]','aHS[2]') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot2.1#| fig-cap: 'Model 06, trace, trace rank and ACF plots for selected parameters'#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model06, # <1>pars=c('bAmHS[1]','bAmHS[2]') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot2#| fig-cap: 'Model 12, trace, trace rank and ACF plots for selected parameters'#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=c('bAmHS[1]','bAmHS[2]') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot3#| fig-cap: 'Model 06, trace, trace rank and ACF plots for selected parameters'#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model06, # <1>pars=c('m_b','s_b') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot3.2#| fig-cap: 'Model 12, trace, trace rank and ACF plots for selected parameters'#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=c('m_b','s_b') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot4.1#| fig-cap: 'Model 06, trace, trace rank and ACF plots for selected parameters'#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model06, # <1>pars=c('m_i','s_i') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot4#| fig-cap: 'Model 12, trace, trace rank and ACF plots for selected parameters'#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=c('m_i','s_i') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot5#| fig-cap: 'Model 06, trace, trace rank and ACF plots for selected parameters'#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model06, # <1>pars=c('m_u','s_u') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot5.2#| fig-cap: 'Model 12, trace, trace rank and ACF plots for selected parameters'#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=c('m_u','s_u') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot6#| fig-cap: 'Model 06, trace, trace rank and ACF plots for selected parameters'#| fig-height: 11#| fig-width: 10tri_plot( stan_object=model06, # <1>pars=c('r_s', paste0('s_w[', 1:4,']')) )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot7#| fig-cap: 'Model 12, trace, trace rank and ACF plots for selected parameters'#| fig-height: 11#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=c('r_M', paste0('Mw[', 1:4,']')) )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot8#| fig-cap: 'Model 12, trace, trace rank and ACF plots for selected parameters'#| fig-height: 11#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=paste0('SI[', 1:5, ']') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a modelMoreover, the density plots and $n_{\text{eff}}$ statistics collectively confirmed that all posterior distributions are unimodal distributions with values centered around a mean, generated with a satisfactory number of uncorrelated sampling points, making substantive sense compared to the models' prior beliefs. @fig-neff visualizes the $n_{\text{eff}}$ diagnostic statistic and @fig-histogram1 through @fig-histogram5 illustrate the chains' graphical analysis.```{r}#| label: fig-neff#| fig-cap: 'Selected models, neff values'#| fig-height: 5#| fig-width: 10par( mfrow=c(1,2) )plot( 1:nrow(model06_parameters), model06_parameters$n_eff, # <1>ylim=c(0, 18000), pch=19, col=rgb(0,0,0,alpha=0.3),xaxt='n',xlab='', ylab='Neff',main='Normal LMM: model 06')axis( side=1, at=1:nrow(model06_parameters), # <2>labels=rownames(model06_parameters),cex.axis=0.8, las=2 )abline( h=seq(0, 18000, by=2000), lty=2, col=rgb(0,0,0,0.3) ) # <3>plot( 1:nrow(model12_parameters), model12_parameters$n_eff, # <4>ylim=c(0, 18000), pch=19, col=rgb(0,0,0,alpha=0.3),xaxt='n',xlab='', ylab='Neff',main='Beta-proportion GLLAMM: model 12')axis( side=1, at=1:nrow(model12_parameters), # <5>labels=rownames(model12_parameters),cex.axis=0.8, las=2 )abline( h=seq(0, 18000, by=2000), lty=2, col=rgb(0,0,0,0.3) ) # <6>par( mfrow=c(1,1) )```1. model 06: Neff values plot2. model 06: parameters names in x-axis3. model 06: convergence threshold4. model 12: Neff values plot5. model 12: parameters names in x-axis6. model 12: convergence threshold```{r}#| label: fig-histogram1.1#| fig-cap: 'Model 06, density plots for selected parameters'#| fig-height: 6#| fig-width: 10par_int =c('a','aHS','bAm','bAmHS') # <1>dens_plot(stanfit_obj=model06, pars=par_int, p=0.95) # <2>```1. parameters of interest2. used-defined function: generation of density plot with HPDI for selected parameters```{r}#| label: fig-histogram1#| fig-cap: 'Model 12, density plots for selected parameters'#| fig-height: 6#| fig-width: 10par_int =c('a','aHS','bAm','bAmHS') # <1>dens_plot(stanfit_obj=model12, pars=par_int, p=0.95) # <2>```1. parameters of interest2. used-defined function: generation of density plot with HPDI for selected parameters```{r}#| label: fig-histogram2#| fig-cap: 'Model 06, density plots for selected parameters'#| fig-height: 6#| fig-width: 10par_int =c('m_b','m_i','m_u','s_b','s_i','s_u') # <1>dens_plot(stanfit_obj=model06, pars=par_int, p=0.95) # <2>```1. parameters of interest2. used-defined function: generation of density plot with HPDI for selected parameters```{r}#| label: fig-histogram2.2#| fig-cap: 'Model 06, density plots for selected parameters'#| fig-height: 6#| fig-width: 10par_int =c('m_b','m_i','m_u','s_b','s_i','s_u') # <1>dens_plot(stanfit_obj=model12, pars=par_int, p=0.95) # <2>```1. parameters of interest2. used-defined function: generation of density plot with HPDI for selected parameters```{r}#| label: fig-histogram3#| fig-cap: 'Model 06, density plots for selected parameters'#| fig-height: 6#| fig-width: 10par_int =c('r_s','s_w') # <1>dens_plot(stanfit_obj=model06, pars=par_int, p=0.95) # <2>```1. parameters of interest2. used-defined function: generation of density plot with HPDI for selected parameters```{r}#| label: fig-histogram4#| fig-cap: 'Model 12, density plots for selected parameters'#| fig-height: 6#| fig-width: 10par_int =c('r_M','Mw') # <1>dens_plot(stanfit_obj=model12, pars=par_int, p=0.95) # <2>```1. parameters of interest2. used-defined function: generation of density plot with HPDI for selected parameters```{r}#| label: fig-histogram5#| fig-cap: 'Model 12, density plots for selected parameters'#| fig-height: 6#| fig-width: 10par_int ='SI'# <1>dens_plot(stanfit_obj=model12, pars=par_int, p=0.95) # <2>```1. parameters of interest2. used-defined function: generation of density plot with HPDI for selected parameters# Discussion {#sec-discussion}## FindingsThis study examined the suitability of the Bayesian Beta-proportion GLLAMM for the quantitative measuring and testing of research theories related to speech intelligibility using entropy scores. The initial findings supported the assertion that Beta-proportion GLLAMMs consistently outperformed Normal LMMs in predicting entropy scores, underscoring its superior predictive performance. The results emphasized that models neglecting the outcomes' measurement error and boundedness lead to underfitting and misspecification issues, even when robust features are integrated. This is clearly illustrated by the Normal LMMs.Secondly, the study showcased the Beta-proportion GLLAMM's proficiency in estimating the latent potential intelligibility of speakers based on manifest entropy scores. Implemented under Bayesian procedures, the proposed model offered a valuable advantage over frequentist methods by further providing the full posterior distribution of the speakers' potential intelligibility. This provision facilitated the calculation of summaries, aiding individual rankings, and supported the comparisons among selected speakers. In both scenarios, the proposed model accounted for the inherent uncertainty in the intelligibility estimates.Thirdly, the study illustrated how the proposed model assessed the impact of speaker-related factors on potential intelligibility. The results suggested that multiple models were plausible for the observed entropy scores, indicating that different speaker-related factor theories were viable for the data, with some presenting contradictory conclusions about the influence of those factors on intelligibility. However, even when unequivocal support for one theory was not possible, the divided support among these models informed that certain statistical issues may be hindering the model's ability to distinguish among individuals and, ultimately, among models. These issues encompassed the insufficient sample size of speakers, the inadequate representation of the population of speakers, and the imprecise measurement of the latent variable of interest.Ultimately, this study introduced researchers to innovative statistical tools that enhanced existing research models. These tools not only assessed the predictability of empirical phenomena but also quantitatively measured the latent trait of interest, namely potential intelligibility, facilitating the comparison of research theories related to this trait. However, the presented tools introduce new challenges for researchers seeking their implementation. These challenges emerge from two distinct aspects: one methodological and the other practical. In the methodological domain, researchers need familiarity with Bayesian methods and the principled formulation of assumptions regarding the data-generating processes and research inquiries. This entails understanding and addressing each of the data and research challenges within the context of a statistical (probabilistic) model. Conversely, in the practical domain, researchers need familiarity with probabilistic programming languages (PPLs), which are designed for specifying and obtaining inferences from probabilistic models -the core of Bayesian methods. To ensure the successful utilization of this new statistical tool, this study addresses both challenges by providing comprehensive, step-by-step guidance in the form of this digital walk-through document.## Limitations and future researchThis study provides valuable insights into the use of a novel approach to simultaneously address the different data features of entropy scores in speech intelligibility research. However, it is important to acknowledge the limitations of this study and explore potential avenues for future research.Firstly, the study interprets potential intelligibility as an unobserved latent trait of speakers influencing the likelihood of observing a set of entropy scores. These scores, in turn, reflect the transcribers' ability to decode words in sentences produced by the same speakers. Despite this practical approach, the construct validity of the latent trait heavily depends on the listeners' appropriate understanding and execution of the transcription task. Construct validity, as defined by Cronbach and Meehl [-@Cronbach_et_al_1955], refers to the extent to which a set of manifest variables accurately represents a concept that cannot be directly measured. Considering the study assumes the transcription task set by Boonen and colleagues [@Boonen_et_al_2021] was properly understood and executed, it expects that potential intelligibility reflects the overall speech intelligibility of speakers. However, this study does not delve into the general epistemological considerations regarding the connection between the latent variable and the concept.Secondly, the study identified a notable absence of unequivocal support for one of the compared models. This deficiency may be attributed to factors such as the insufficient sample size of speakers, the inadequate representation of the populations of speakers (referred to as selection bias), and the imprecise measurement of the latent variable. Insufficient sample size and selection bias yield data with limited outcome and covariates ranges, leading to biased and imprecise parameter estimates [@Everitt_et_al_2010]. Furthermore, these issues, exacerbated by reduced measurement precision, can result in models with diminished statistical power and a higher risk of type I or type II errors [@McElreath_2020]. Consequently, future research should consider conducting power analyses for the proposed models. This entails assessing the impact of expanding the speakers' pool on testing research theories, or increasing the number of speech samples, transcriptions, and listeners to enhance the precision of potential intelligibility estimates. With these insights, future investigations should contemplate increasing the speaker sample with a group that adequately represents the population of interest. However, this must be done while mindful of the pragmatic limitations associated with transcription tasks, specifically considering the costs and time-intensiveness of the procedure.Thirdly, the study presented an illustrative example for the investigation of research theories within the model's framework. However, it did not offer an exhaustive evaluation of all factors influencing intelligibility, which are thoroughly explored in the works of Boons et al. [-@Boons_et_al_2012], Fagan et al. [-@Fagan_et_al_2020], Gillis [-@Gillis_2018], and Niparko et al. [-@Niparko_et_al_2010]. Consequently, the study cannot discard the presence of unobservable variables that might bias the parameter estimates, potentially impacting the inferences provided. Hence, future research should consider integrating appropriate causal hypotheses about these factors into the proposed models, as proper covariate adjustment facilitates the production of unbiased and precise parameter estimates [@Cinelli_et_al_2021; @Deffner_et_al_2022].Lastly, this study proposes two directions for future exploration in speech intelligibility research. Firstly, there is an opportunity to investigate alternative methods for assessing speech intelligibility beyond transcription tasks and entropy scores. The experimental design of transcription tasks imply that the procedure may be time-intensive and costly. Thus, exploring less time-intensive or more cost-effective procedures, that still offer comparable precision in intelligibility estimates, could benefit both researchers and speech therapists alike. An illustrative example of such a method is Comparative Judgment (CJ), where judges compare and score the perceived intensity of a trait between two stimuli [@Thurstone_1927]. In the context of the intelligibility trait, the stimuli under assessment could be the speech samples uttered by two speakers. Nevertheless, CJ serve as an ideal example as the method has gained increasing attention within the realm of educational assessment, with several studies providing evidence for its validity in assessing various task within student works, as demonstrated by examples in Pollit [-@Pollitt_2012a; -@Pollitt_2012b], Lesterhuis [-@Lesterhuis_2018], van Daal [-@vanDaal_2020] and Verhavert et al. [-@Verhavert_et_al_2019]. <!-- psychological research field, as it has been demonstrated that various judgment types essentially involve comparisons, drawing on stimulus-to-stimulus representations stored in judge's memory to arrive at alternative scoring methods [@Lockhead_2004]. -->Conversely, a second avenue for exploration involves integrating diverse data types and evaluation methods to assess individuals' intelligibility. This can be accomplished by leveraging two features of Bayesian methods: their flexibility and the concept of Bayesian updating. Bayesian methods possess the flexibility to simultaneously handle various data types. Additionally, through Bayesian updating, researchers can integrate information from the posterior distribution of parameters as priors in models for subsequent evaluations. Ultimately, this could enable researchers to assess speakers' intelligibility progress without committing to a specific data type or evaluation method. This advancement could mirror the emergence of second-generation Structural Equation Models proposed by Muthen [@Muthen_2001], where models facilitate the combined estimation of categorical and continuous latent variables. However, in the context of future research, the proposal would facilitate the estimation of latent variables using a combination of data types and evaluation methods, contingent upon the fulfillment of construct validity by those evaluation methods.<!-- Relatedly, the use of these models lead to a careful design and planning of the experiments designed to measure intelligibility. --># Conclusions {#sec-conclusions}This study highlights the effectiveness of the Bayesian Beta-proportion GLLAMM to collectively address several key data features when investigating unobservable and complex traits, using speech intelligibility and entropy scores as an example. The results demonstrate the proposed model consistently outperforms the Normal LMM in predicting the empirical phenomena. Moreover, it exhibits the ability to quantify the latent potential intelligibility of speakers, allowing for the ranking and comparison of individuals based on the latent trait while accommodating associated uncertainties. Additionally, the proposed model facilitates the exploration of research theories concerning the influence of speaker-related factors on potential intelligibility. The study indicates that integrating and comparing these theories within the model's framework is a straightforward task. However, the introduction of these innovative statistical tools presents new challenges for researchers seeking implementation. These challenges encompass the principled formulation of assumptions about the data-generating processes and research inquiries, along with the need for familiarity with probabilistic programming languages (PPLs) essential for implementing Bayesian methods. Nevertheless, the study suggests several promising avenues for future research, including power analysis, causal hypothesis formulation, and exploration and integration of novel evaluation methods for assessing intelligibility. The insights derived from this study hold implications for both researchers and data analysts interested in quantitatively measuring and testing theories related to nuanced, unobservable constructs, while also considering the appropriate prediction of the empirical phenomena. <!-- # Journal (to erase) --><!-- Things to consider --><!-- 1. it is important to address the issue of statistical power. (Need to be addressed) --><!-- 2. Multiple NHST tests inflate null-hypothesis rejection rates. (NO need to address because of Bayesian) --><!-- a. p-hacking (NO issues) --><!-- b. dropping observation (NO issues) --><!-- c. covariate analysis (NO issues, mention they are exploratory) --><!-- 3. Rich descriptions of the data help reviewers, the Editor, and other readers understand your findings. (already done) --><!-- 4. Cherry picking experiments, conditions, DVs, or observations can be misleading. (NO issues) --><!-- 5. Be careful about using null results to infer "boundary conditions" for an effect. (NO issues) --><!-- 6. Authors should use statistical methods that best describe and convey the properties of their data. (already done) --><!-- Follow [Behavior research methods](https://www.springer.com/journal/13428/submission-guidelines#Instructions%20for%20Authors_Behavior%20Research%20Methods%20General%20Information) for more information. --># References::: {#refs}:::